1372

Can we predict motion artifacts in clinical MRI before the scan completes?

Malte Hoffmann^1,2, Nalini M Singh^3,4, Adrian V Dalca^1,2,3, Bruce Fischl^1,2,3,4, and Robert Frost^1,2
¹Department of Radiology, Harvard Medical School, Boston, MA, United States, ²Department of Radiology, Massachusetts General Hospital, Boston, MA, United States, ³Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, United States, ⁴Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, United States

Synopsis

Keywords: Machine Learning/Artificial Intelligence, Artifacts, deep learning, AI-guided radiology, neuroimaging, computer vision

Subject motion remains the major source of artifacts in magnetic resonance imaging (MRI). Motion correction approaches have been successfully applied in research, but clinical MRI typically involves repeating corrupted acquisitions. To alleviate this inefficiency, we propose a deep-learning strategy for training networks that predict a quality rating from the first few shots of accelerated multi-shot multi-slice acquisitions, scans frequently used for neuroradiological screening. We demonstrate accurate prediction of the scan outcome from partial acquisitions, assuming no further motion. This technology has the potential to inform the operator's decision on aborting corrupted scans early instead of waiting until the acquisition completes.

Introduction

Clinical magnetic resonance imaging (MRI) makes routine use of multi-slice sequences with two-dimensional (2D) encoding that acquire each k-space slice over several shots, such as fast spin echo (FSE)¹. Unfortunately, these scans are vulnerable to motion between shots, making patient motion a predominant source of artifacts.

While techniques such as PROPELLER² successfully correct for in-plane motion after the fact, retrospective correction of through-plane motion is challenging³. Prospective correction dynamically updates the acquisition as motion happens in the scanner but has not been adopted clinically. Instead, MRI technologists typically repeat the FSE sequence if the image exhibits artifacts.

We present a deep-learning strategy alleviating the inefficiency of waiting until the end of data acquisition before image quality can be assessed (Figure 1). While previous techniques require a reconstructed magnitude image to predict an artifact level^4-6, our networks take as input only the first few k-space segments, to provide the user with an estimate of the likely outcome of the MRI scan.

Method

We train a deep neural network $$$h$$$ to predict image quality $$$s$$$ from the first $$$N$$$ k-space shots of a multi-shot 2D MRI acquisition only, assuming no further motion occurs in the remaining shots (Figure 2). We generate suitable training data by simulating motion between the first $$$N$$$ shots only, and based on the image reconstructed from all shots, we assign a ground-truth artifact rating to the $$$N$$$-shot network input.

Training data generation
Let $$$k=SFCx$$$ be the multi-shot multi-channel k-space of motion-free magnitude image $$$x$$$, where $$$C$$$ denotes the coil sensitivity, $$$F$$$ denotes the Fourier Transform, and $$$S$$$ denotes the $$$k_y$$$ sampling. To generate motion-corrupted $$$\tilde{k}$$$, we separately move $$$x$$$ for each shot $$$i$$$ and multiply by ESPIRiT⁷ coil-sensitivity profiles, producing multi-channel images $$$\{Cx_i\}$$$. We form $$$\tilde{k}$$$ by combining the $$$k_y$$$ lines of each $$$k_i=S_iFCx_i$$$ corresponding to shot $$$i$$$ after adding correlated Gaussian noise to the real and imaginary components based on noise covariance measurements (Figure 2).

Ground-truth scores
For training, we obtain ground-truth artifact scores from magnitude images reconstructed from $$$\tilde{k}$$$ by using the Image Quality Dashboard model⁶ (IQD), an internal rating tool, which achieves good artifact classification. Trained by the original authors on radiologist ratings, IQD predicts a score $$$s\in[0, 3]$$$, where $$$s=0$$$ means no artifact.

Artifact-rating model
Network $$$h_\theta$$$ with parameters $$$\theta$$$ predicts the scalar artifact score $$$\hat{s}=h_\theta(k)$$$ from zero-filled complex multi-channel k-space $$$k$$$. A series of Interlacer-type⁸ layers (Figure 3) extract image and k-space features, which we condense using global average pooling before regression (Figure 1). We choose these layers because they have been shown to perform better than convolutions in image or k-space alone⁸. We implement $$$h_\theta$$$ in TensorFlow⁹ and fit $$$\theta$$$ by optimizing a mean-squared-error (MSE) loss on $$$\hat{s}$$$ until convergence (batch size 2, learning rate $$$10^{-4}$$$).

Experiment

Data
For training, we generate a total of 180k axial T2-FLAIR FSE slices with different motion using k-space collected from 349 MGH outpatient scans, holding out 3 subjects for validation (ARC $$$R=3$$$, 23 interleaved 4.5-mm slices with 6 k-space segments, TR/TI/TE 10000/2600/118 ms, FA 90$$$^\circ$$$). For testing, we scan a volunteer with the same sequence in five different head positions and form corrupted datasets by substituting the first 3 segments across acquisitions (Figure 5A). We use a 3-T General Electric Signa Premier system with a 48-channel head coil, discarding 4 neck channels.

Setup
We train partial k-space models using only the first $$$N=3$$$ of 6 shots as input, corresponding to an artifact rating after 50% of the acquisition. We assess performance in terms of mean squared error, on simulated-motion data from held-out subjects, and acquired volunteer scans. To assess the benefit of applying convolutions in both image and k-space using Interlacer-type layers, we train baseline models with matched capacity, applying convolutions in either space only.

Results
Figure 4 compares in-distribution and out-of-distribution artifact-rating accuracy. Although the model only has access to 50% of k-space, it generalizes to out-of-distribution data, with some outliers for scores $$$s>2.5$$$. The Interlacer-type model outperforms the baselines that operate in image or k-space only. Figure 5B shows representative example images across a range of artifact scores.

Discussion

We present a simulation-based strategy for training networks to predict the outcome of multi-shot MRI before the scan completes and demonstrate its suitability for acquired data.

Ground-truth scores
Our training strategy can be applied to anatomies other than the brain, given ground-truth ratings. Rather than labeling training data manually, we leverage IQD from prior work⁶ to reduce the human effort but plan to replace this dependency with a metric quantifying the injected motion and $$$k_y$$$ pattern.

Phase-encode pattern
While we train models with a standard FSE $$$k_y$$$ scheme to support routine clinical exams, an optimized scheme may enable earlier prediction of image quality. Trained with the appropriate MRI contrast and $$$k_y$$$ pattern, the model could support any multi-shot sequence.

Conclusion

We demonstrate the feasibility of predicting motion artifacts when the scan is only half complete. The technology has the potential to improve efficiency in the radiology unit by informing the operator's decision on aborting the acquisition early.

Acknowledgements

The authors thank Bernardo Bizzo and Dufan Wu for model sharing, and Kathryn Evancic, Marcio Rockenbach, Eugene Milshteyn, Dan Rettmann, Sabrina Qi, Suchandrima Banerjee, Arnaud Guidon, and Anja Brau for assistance and helpful discussions.

The project benefited from funding from General Electric Healthcare. Additional support for this research was provided in part by the BRAIN Initiative Cell Census Network (U01 MH117023), the National Institute of Biomedical Imaging and Bioengineering (P41 EB015896, P41 EB015902, P41 EB030006, R01 EB023281, R01EB032708, R01 EB019956, R21 EB029641, R21 EB018907), the National Institute of Child Health and Human Development (K99 HD101553), the National Institute on Aging (R56 AG064027, R01 AG016495, R01 AG070988), the National Institute of Mental Health (RF1 MH121885, RF1 MH123195), the National Institute of Neurological Disorders and Stroke (R01 NS070963, R01 NS083534, R01 NS105820). Additional support was provided by the NIH Blueprint for Neuroscience Research (U01 MH093765), part of the multi-institutional Human Connectome Project. The project was made possible by the resources provided by Shared Instrumentation Grants (S10 RR023401, S10 RR019307, S10 RR023043) and by computational hardware generously provided by the Massachusetts Life Sciences Center (https://www.masslifesciences.com).

Bruce Fischl has a financial interest in CorticoMetrics, a company whose medical pursuits focus on brain imaging and measurement technologies. This interest is reviewed and managed by Massachusetts General Hospital and Mass General Brigham in accordance with their conflict of interest policies.

References

1. Hennig J et al. RARE imaging: a fast imaging method for clinical MR. Magn Reson Med. 1986;3(6):823-833.

2. Pipe JG. Motion correction with PROPELLER MRI: application to head motion and free-breathing cardiac imaging. Magn Reson Med. 1999;42(5):963-9.

3. Norbeck O et al. T1-FLAIR imaging during continuous head motion: Combining PROPELLER with an intelligent marker. Magn Reson Med. 2021;85(2):868-82.

4. Sujit SJ et al. Automated image quality evaluation of structural brain MRI using an ensemble of deep learning networks. J Magn Reson Imaging. 2019;50(4):1260-7.

5. Sreekumari A et al. A Deep Learning–Based Approach to Reduce Rescan and Recall Rates in Clinical MRI Examinations. AJNR Am J Neuroradiol. 2019;40(2):217-23.

6. Bizzo BC et al. Machine Learning Model For Motion Detection And Quantification On Brain MR: A Multicenter Testing Study. RSNA, Chicago, IL. 2021.

7. Uecker M et al. ESPIRiT - An Eigenvalue Approach to Autocalibrating Parallel MRI: Where SENSE meets GRAPPA. Magn Reson Med. 2014;71(3):990-1001.

8. Singh NM et al. Joint Frequency and Image Space Learning for MRI Reconstruction and Analysis. Machine Learning for Biomedical Imaging. 2022;1:1-28.

9. Abadi M et al. TensorFlow: a system for Large-Scale machine learning. USENIX, Savannah, GA. 2016:265-83.

Figures

Figure 1. Training strategy. The k-space sampling model simulates motion in the first few shots of a k-space slice. Undersampled channels and their inverse Fourier Transform (IFT) are input to the artifact-rating model, which uses convolutions (Figure 3), pooling, and dense layers to predict a ReLU-activated artifact score. We activate all other trainable layers with LeakyReLU (parameter 0.2). We minimize the mean squared error (MSE) from the ground truth s, obtained for the magnitude image reconstructed from all shots, using a prior model trained by the original authors (IQD).

Figure 2. Training data simulation using clinical raw k-space data. (A) Examples of simulated head positions and corresponding k_y sampling in each shot of an accelerated 6-shot acquisition. We reconstruct an image by combining shots and using the vendor's SDK. (B) For each shot, we create moved multi-channel images, transform (FT) to k-space, add noise, and keep only the k_y lines for that shot. (C) Image examples with corresponding ground-truth artifact scores for motion injected in the first 3 shots. (D) Resulting distribution of artifact scores across the simulated training set.

Figure 3. Interlacer-type layer used within the convolutional encoder of the artifact-rating model (Figure 1). The layer separately convolves image and k-space features and mixes these features via channel-wise addition after the appropriate forward or backward Fourier Transform (FT). This enables the simultaneous extraction of information from neighboring voxels in image space and similar spatial frequencies in k-space despite the different intensity scales of both spaces. We apply operations other than the FT on concatenated real and imaginary channels.

Figure 4. Artifact prediction accuracy. A score s=0 represents perfect image quality. Left: simulated data for subjects held-out during training (MSE 0.13). Center: volunteer data with the first 3 shots substituted across 5 back-to-back scans in different head positions (MSE 0.17). Right: deviation from ground truth for capacity-matched models operating in k-space, image space, or both spaces - using Interlacer-type convolutions (Figure 3) - for the acquired data of the central panel. Each panel shows scores for 1475 slices. Black lines represent median scores.

Figure 5. (A) Validation data generation mixing shots from volunteer scans in different head positions. (B) Representative examples of mixed-validation images along with artifact scores. We show ground-truth scores (GT) obtained from the IQD model, trained with expert ratings by the original authors. This model predicts a score in the interval [0, 3] for a magnitude image reconstructed from the full acquisition, where higher scores mean more artifact. In contrast, our model predicts scores from the first 3/6 shots, that is, after the scan is only half complete.

Proc. Intl. Soc. Mag. Reson. Med. 31 (2023)

1372

DOI: https://doi.org/10.58530/2023/1372