3909

Understanding domain shift in learned MRI reconstruction: A quantitative analysis on fastMRI knee and neuro sequences
Shizhe He1,2, Veronika Anne Zimmer1, Daniel Rueckert1,3, and Kerstin Hammernik1,3
1Lab for Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany, 2Otto-von-Taube-Gymnasium, Gauting, Germany, 3Department of Computing, Imperial College London, London, United Kingdom

Synopsis

In this work, we investigate the problem of domain shift in the context of state-of-the-art MRI reconstruction networks with respect to variations in training data. We provide visualization tools and support our findings with statistical analysis for the networks evaluated on fastMRI knee and neuro data. We observe that the signal-to-noise ratio of the examined sequences plays an essential role, and we statistically prove the hypothesis that the type/amount of training data is less important for low acceleration factors. Finally, we provide a visualization tool facilitating the examination of the networks’ performance on each individual subject of the fastMRI data.

Introduction

Prior research on Magnetic Resonance Image (MRI) reconstruction focuses on the implementation of deep learning (DL) algorithms and their evaluation on image quality1-7. However, the impact of domain shift in learned MRI reconstruction has rarely been studied2,8. In this work, we provide visualization tools and statistically investigate the impact of domain shift for state-of-the-art networks trained with various data configurations of fastMRI knee and neuro data7. As in other application fields of DL, variations in data distribution between the training and test data are able to significantly impact the performance of DL models in a real-world clinical setting.

Methods

Experimental Setup
We follow the experimental setup of Hammernik et al:2: Specifically, we employ reconstruction networks of varying architectures: (1) three state-of-the-art DL networks UNET7,9, MoDL5, and VN6 and (2) Down-Up Networks (DUNETs) incorporating three different data consistency layers2, i.e., Gradient Descent (GD), Proximal Mapping (PM) and Variable Splitting (VS). All networks were trained with varying configurations of the fastMRI multi-coil train knee and neuro dataset2. These configurations along with details on the networks are depicted in Fig.1. Testing was performed quantitatively on the fastMRI multi-coil validation knee and neuro dataset using the Structural Similarity Index (SSIM) for acceleration factors R=4 and R=8.

Visualizations
We examined the generalization potential for the investigated reconstruction networks using box plots and scatter plots to visualize the networks’ behaviour for scanner models, and knee/neuro sequences. We created an interactive scatterplot illustrating all data samples and their respective anatomy categorized by networks and sorted by the SSIM-values, allowing us to identify the performance of each individual subject in the fastMRI dataset.

Statistical Analysis
We examined the impact of variations in training data on the best-performing network PM-DUNET2 proving the statistical significance in distributions using a Mann-Whitney U Test with a 95% confidence interval. Hence, we were able to identify key parameters and relationships determining the distribution of the networks’ behaviours over the given data domains.

Source code is provided at https://github.com/h3seas0n/ismrm2022-domainshift-fastMRI.

Results and Discussion

We visualized the networks’ performance from different perspectives with respect to variations in training data in Figs.2-4. The scatterplots in Fig.2 compare the correlation of SSIM values separately on the fastMRI knee and neuro validation dataset, for R=4 and R=8, for networks that were trained separately on knee and neuro data. The positioning of SSIM values on the linear function y=x, i.e., SSIM-knee=SSIM-neuro, would represent the perfect model generalization. We observe that the models for R=4 generalize substantially better than for R=8, hence, the type of training data is less important for low accelerations. The best performing network in all cases is PM-DUNET2.

The boxplots in Fig.3 show the performance of PM-DUNET2, evaluated for scanner models at 1.5T and 3T on the knee and neuro data at R=4 and R=8. Statistical differences (p<0.001) between knee and neuro training data are found within scanner models of neuro validation data, indicating that the number of training subjects plays a vital role to span a larger solution manifold. For knee validation data, we only observe statistical differences for Skyra at R=4 (p<0.01), Skyra at R=8 (p<0.001), and Aera at R=8 (p<0.01). We conclude that the large number of neuro training data generalizes well for knee data. Furthermore, we observe substantially worse reconstruction quality of neuro data at 1.5T Avanto and Aera. We suspect a potential source of this behaviour in a low SNR.

In Fig.4, we study the reconstruction results of PM-DUNET evaluated individually on the sequences CORPD-FBK and CORPDFS-FBK of the fastMRI knee validation set at R=4 and R=8. Significant differences (p<0.05) between training with knee-100 against all other anatomies are marked with red stars. It is important to note that CORPDFS-FBK measurements have lower SNR compared to CORPD-FBK. Our results indicate less statistical significance for low SNR data, i.e., CORPDFS-FBK, for both acceleration factors. We observe that training with knee-100, knee-50, and joint-uni-100 data yields no statistical difference for CORPD-FBK, at R=4 and R=8. Our statistical analysis supports that the type and amount of training data are critical for R=8. Low SNR, i.e, CORPDFS-FBK, data generalize better for a wide range of training configurations, while having a lower SSIM and a high standard deviation.

Fig. 5 depicts the SSIM values for each individual subject of the fastMRI knee and neuro validation set, reconstructed with the six state-of-the-art networks. This visualization allows us to examine which subjects were reconstructed best/worst for the individual networks, and identify outliers.

Conclusion

In this work, we investigated the impact of domain shift for state-of-the-art neural networks in undersampled MRI reconstruction on the highly heterogeneous fastMRI dataset. We statistically proved that networks trained for R=4 are less prone to domain shift, hence, the type and amount of training data are less critical at low accelerations. However, we observe that knee and neuro data react differently to domain shift, and our results indicate that this might be related to differences in SNR rather than differences in anatomy, supporting the findings in Knoll et al.8. However, for clinical applicability, quantitative analysis of image quality is not sufficient and support from medical specialists is required to individually rate the reconstructed images with respect to their diagnostic value.

Acknowledgements

This work was supported by TUMKolleg, a collaborative project between the Technical University of Munich, Germany, and the grammar school Otto-von-Taube-Gymnasium Gauting, Germany.

References

  1. Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K. Sodickson, Mehmet Akcakaya. Deep-Learning Methods for Parallel Magnetic Resonance Imaging Reconstruction: A Survey of the Current Approaches, Trends, and Issues. IEEE Signal Processing Magazine, 37(1):128-140, 2020.
  2. Kerstin Hammernik, Jo Schlemper, Chen Qin, Jinming Duan, Ronald M. Summers, Daniel Rueckert. Systematic evaluation of iterative deep neural networks for fast parallel MRI reconstruction with sensitivity-weighted coil combination. Magnetic resonance in medicine, 86, 2021.
  3. Taejoon Eo, Yohan Jun, Taeseong Kim, Jinseong Jang, Ho-Joon Lee, Dosik Hwang. KIKI-net: cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance. Magnetic Resonance in Medicine, 80(5):2188-2201, 2018.
  4. Mehmet Akçakaya, Steen Moeller, Sebastian Weingärtner, Kâmil Uğurbil. Scan-specific robust artificial-neural-networks for k-space interpolation (RAKI) reconstruction: Database-free deep learning for fast imaging. Magnetic Resonance in Medicine, 81(1):439-453, 2019.
  5. Hemant K. Aggarwal, Merry P. Mani, Mathews Jacob. MoDL: Model Based Deep Learning Architecture for Inverse Problems. IEEE Transactions on Medical Imaging, 38(2):394–405, 2019.
  6. Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael P. Recht, Daniel K. Sodickson, Thomas Pock, Florian Knoll. Learning a Variational Network for Reconstruction of Accelerated MRI Data. Magnetic Resonance in Medicine, 79(6):3055–3071, 2018.
  7. Jure Zbontar, Florian Knoll, Anuroop Sriram, Matthew J Muckley, Mary Bruno, Aaron Defazio, Marc Parente, Krzysztof J Geras, Joe Katsnelson, Hersh Chandarana, et al. fastMRI: An open dataset and benchmarks for accelerated mri. arXiv preprint arXiv:1811.08839, 2018.
  8. Florian Knoll, Kerstin Hammernik, Erich Kobler, Thomas Pock, Michael P Recht, Daniel K Sodickson. Assessment of the generalization of learned image reconstruction and the potential for transfer learning. Magnetic Resonance in Medicine, 81(1):116-128, 2018.
  9. Olad Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention, p. 234–241, 2015.


Figures

Figure 1: Overview of network configurations, MRI sequences of the fastMRI knee and neuro dataset, and training dataset composition. Six state-of-the-art networks were investigated that differ in the type of regularization network and data consistency2. All networks were trained for different training data configurations, including knee, neuro, and joint trainings, with and without uniform distribution of sequences. The number of training subjects for knee and joint training varied from 25-100%.

Figure 2: Scatterplots for variations in training data, for all acquisition types, for all examined networks at R=4 and R=8. Distribution of data points along the blue line represents the ideal scenario, i.e., best generalization. Data points with a yellow border were tested on knee data, without border on neuro data.

Figure 3: Comparison of PM-DUNET when trained on knee (green bars) and neuro (orange bars) data, evaluated for scanner models. Statistical differences (p<0.001) are found within all scanner models for neuro data. Furthermore, we observe substantially worse reconstruction quality for 1.5T Avanto, and large outliers and standard deviations for 1.5T Aera. Statistical differences are found within scanner models in knee data for Skyra at R=4 (p<0.01), Skyra at R=8 (p<0.001) and Aera at R=8 (p<0.01).

Figure 4: Boxplots for variations in training data, evaluated individually for CORPD-FBK and CORPDFS-FBK of the fastMRI knee validation set, for PM-DUNET at R=4 and R=8. CORPDFS-FBK is statistically less affected by domain shift compared to CORPD-FBK. The red stars mark statistical significance (p-values<0.05).

Figure 5: An animated sequence illustrating the usage of our visualization tool, examining the data samples, the training anatomy, and their respective SSIM values categorized by the evaluation network. This visualization facilitates more efficient identification and analysis of abnormalities/outliers (individual subjects).

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)
3909
DOI: https://doi.org/10.58530/2022/3909