0479

NoiseFlow: Deep Learning using Noise-Driven Training

Joseph Yitan Cheng¹, David Y. Zeng², John M. Pauly², Shreyas S. Vasanawala¹, and Bob Hu²

¹Radiology, Stanford University, Stanford, CA, United States, ²Electrical Engineering, Stanford University, Stanford, CA, United States

Synopsis

Deep learning provides a powerful data-driven solution to a wide range of imaging tasks, from data acquisition to image interpretation. To train these deep and highly nonlinear models, a well labeled and very large dataset is typically required. However, data with accurate labels are difficult, sometimes impossible, and expensive to collect. Without enough data, the learned model will be highly biased and unable to generalize. In the worst case, the application of the deep model may result in misdiagnosis and improper patient management. Thus, we propose NoiseFlow, a solution to reduce the dependency of deep learning solutions on real data through noise-driven training.

Introduction

Deep learning has permeated the entire medical imaging field, providing high performance solutions for a wide range of tasks, from data acquisition to image interpretation. In this framework, a nonlinear model with millions of parameters is trained with an extremely large dataset. Model accuracy and generalizability are highly dependent on the training examples. However, data with accurate labels are difficult and expensive to collect. If insufficient training examples are used, overfitting and bias become significant concerns. In the worst case, the learned model will remove/add critical features, and result in misdiagnosis and improper patient management. Therefore, we propose NoiseFlow, a solution to reduce the dependency on real training data through noise-driven training.

Method

Conventionally, to increase generalizability, noise is introduced to augment the training process^1,2, but this solution assumes that all new data for inference can be described by the training dataset within a small degree of error. Here, we propose to start from the entire image space and conservatively reduce the size of this space to maintain generalizability. As a starting point, we generate training examples where each pixel is independently sampled from a uniform distribution. Images, including MR scans, lie within this space described by the randomly generated images, albeit these images are in a highly concentrated location³. The generated noise images are then used to probe the complex system that we aim to model using a deep neural network (Figure 1).

To demonstrate NoiseFlow, we explored multi-channel image reconstruction^4,5. A deep convolutional neural network (12 repeated blocks of 3 ResNet blocks⁶ with 3x3 convolutions and ReLU activation) was constructed to take in 8 channels of complex data and output 8 channels of complex data. For each training example, a k-space image was generated using noise (Figure 2). Multiple channels of k-space data were then constructed by convolving the k-space data with independently generated random 5x5 kernels. This data was transformed into the image domain as the truth (or “label”). The multi-channel k-space data were also subsampled and transformed into the image domain to create the training input data.

Two questions were investigated: 1) the generalizability of networks trained using NoiseFlow, and 2) performance gain (or loss) when using a smaller data manifold for training. Volumetric knee datasets from mridata.org^7,8 were used to test the NoiseFlow trained network. Additionally, a subset of the knee datasets was used to separately train an equivalent network for comparison. TensorFlow⁹ with Adam optimizer to minimize the l2 loss was used (Figure 1). Using BART¹⁰, l2-ESPIRiT¹¹ was also performed for comparison as the state-of-the-art parallel imaging algorithm. Additional datasets used were collected with IRB approval and informed consent.

Results

The nonlinear model was able to fit to the training examples generated using NoiseFlow (Figure 3) for both random and uniform subsampling patterns (calibration region of 20x20). This result differed from typical deep learning techniques where the learned model ignores/removes noise from the input data.

On the volumetric knee dataset (Figure 4), the model trained using NoiseFlow was able to reduce aliasing and recover spatial resolution with lower normalized-root-mean-square-error (NRMSE, normalized by norm of reference) and higher structural similarity (SSIM). The network trained with knees outperformed the NoiseFlow-trained network as expected, but the increased smoothness¹² that was not reflected by NRMSE and SSIM suggested a potential for concern. Critical anatomical features may have been lost. Both deep learning approaches outperformed l2-ESPIRiT. Lastly, the generalizability of the NoiseFlow trained network was demonstrated in reduced aliasing artifacts in scans of the abdomen and pelvis (Figure 5).

Discussion & Conclusion

NoiseFlow was trained with no real data, only noise, and had comparable performance to networks trained with data similar to the test set. Using NoiseFlow, complex systems can be characterized without the concern of data bias. The data space described by noise is highly generalizable but comes at a cost: larger networks are needed to capture the entire space. If data bias can be avoided, tasks, such as image reconstruction, can benefit from a smaller data manifold. The proposed approach provides a potential method to distinguish what properties of a given task can be data independent and where the performance can be improved through a smaller data manifold. NoiseFlow does depend on access to the system model, but we hypothesize that the training process should be robust to small errors in the system model as long as these errors can be adequately captured by the noise manifold. In summary, we have proposed and demonstrated a highly generalizable deep learning approach to train complex systems that avoids issues of data collection and bias.

Acknowledgements

NIH R01-EB009690, NIH R01-EB019241, NIH R01-EB026136, and GE Healthcare.

References

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Noh, H., You, T., Mun, J. & Han, B. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization. in Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) 5109–5118 (Curran Associates, Inc., 2017).
Zhu, B., Liu, J. Z., Cauley, S. F., Rosen, B. R. & Rosen, M. S. Image reconstruction by domain-transform manifold learning. Nature 555, 487–492 (2018).
Pruessmann, K. P., Weiger, M., Scheidegger, M. B. & Boesiger, P. SENSE: Sensitivity encoding for fast MRI. Magn. Reson. Med. 42, 952–962 (1999).
Griswold, M. A. et al. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 47, 1202–1210 (2002).
He, K., Zhang, X., Ren, S. & Sun, J. Identity Mappings in Deep Residual Networks. arXiv:1603.05027 [cs.CV] (2016).
Epperson, K. et al. Creation of Fully Sampled MR Data Repository for Compressed Sensing of the Knee. in SMRT 22nd Annual Meeting (2013). doi:10.1.1.402.206
Ong, F., Amin, S., Vasanawala, S. S. & Lustig, M. An Open Archive for Sharing MRI Raw Data. in ISMRM & ESMRMB Joint Annual Meeting 3425 (2018).
Abadi, M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467 [cs.DC] (2016).
Uecker, M. BART: version 0.4.03 https://mrirecon.github.io/bart (2018). doi:10.5281/zenodo.1215477
Uecker, M. et al. ESPIRiT-an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn. Reson. Med. 71, 990–1001 (2014).
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. & Efros, A. A. Context Encoders: Feature Learning by Inpainting. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2536–2544 (IEEE, 2016).
Zhang, T., Pauly, J. M., Vasanawala, S. S. & Lustig, M. Coil compression for accelerated imaging with Cartesian sampling. Magn. Reson. Med. 69, 571–582 (2013).

Figures

Figure 1. Method overview of NoiseFlow: learning the inverse mapping of arbitrary system models. During training (a), a noise generator is used to create “ground truth” examples for probing the system model. Model

$G$ parameterized by

$\theta$ is then trained to learn the inverse mapping of the system model output

$x$ to the ground truth data

$y$ using supervised learning. This noise-driven training process probes the system model with a general data space, and real data lies within this space. Thus, the learned model can then be used on real data (b) in the deployment phase.

Figure 2. Data generation for multi-channel reconstruction (parallel imaging). Complex k-space data is generated using an independent uniform distribution. To simulate 8 channels of data, 8 complex kernels (5x5) are generated and convolved with the initial k-space data to produce 8 channels of data. Subsampling is then applied to the 8-channel k-space data. An inverse Fourier transform (IFFT) is applied to transform the k-space data into the image domain. The generated data is then used to train a nonlinear model to estimate the fully sampled data from the subsampled data. This learned model can be applied on real MR data.

Figure 3. NoiseFlow training examples for parallel imaging. Both random (a) and uniform (b) sampling masks (R=4) were tested using 8 channels. A CNN model was trained to map the subsampled input (first row) to the truth (third row) – separate models were trained for the two different subsampling strategies. The output of the model (second row) captures the overall structure of the output. The purpose of NoiseFlow training is to rely solely on the correlated information between different channel and to avoid anatomy-specific structure that will result in data bias.

Figure 4. Coronal proton-density-weighted volumetric knee datasets for two different subjects displayed with sagittal and axial slices. The ground truth was subsampled using random (a) and uniform (b) masks. Network trained using NoiseFlow was compared to a network trained with knee images (third column) and l2-ESPIRiT (fourth column). Using pure noise training, the network was able to reduce aliasing and recover resolution. As expected, the network trained with knees had higher performance, but resulted in increased smoothing. NoiseFlow demonstrated that intrinsic system properties can be learned, and smaller training manifold can increase reconstruction performance.

Figure 5. Post-contrast T1-weighted volumetric axial 3T scans using a spoiled gradient recalled echo sequence (GE MR750). The 32-channel data (truth, last column) was subsampled with random (a) and uniform (b) masks (R=4) and coil compressed13 to 8-channels (first column). The data was then reconstructed using the model trained using NoiseFlow (second), model trained using knee data (third), and l2-ESPIRiT (fourth). Both models were able to reduce aliasing artifacts. The model trained with knees overly smooths the images. The model trained with NoiseFlow reduces aliasing, but artifacts remain. Reconstruction can be further improved by NoiseFlow training with more channels.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

0479