0350

Evaluating Machine Learning-Based MRI Reconstruction Using Digital Image Quality Phantoms
Fei Tan1, Jana G. Delfino1, and Rongping Zeng1
1Division of Imaging, Diagnostics and Software Reliability (DIDSR), U.S. Food and Drug Administration, Sliver Spring, MD, United States

Synopsis

Keywords: AI/ML Image Reconstruction, Precision & Accuracy, image quality assessment, digital phantoms

Motivation: Quantitative image quality evaluation tools are needed for machine learning-based MR reconstruction.

Goal(s): To introduce digital image quality phantoms and evaluation metrics tailored for machine learning-based MR reconstruction, scalable to form large test sets, and flexible to simulate various object size, image contrast, signal-to-noise-ratio, resolution etc.

Approach: We created 2D disks, resolution arrays, and low-contrast phantoms resembling MR ACR phantom properties. The evaluation includes geometric accuracy, intensity uniformity, resolution, and low-contrast detectability. We evaluated the AUTOMAP reconstruction model trained on the M4Raw and FastMRI datasets with these phantoms.

Results: The study provides a tool for evaluating machine learning-based MRI reconstruction.

Impact: This research establishes digital phantoms and quantitative metrics for evaluating machine learning-based MRI reconstruction. These tools enable accurate assessment of fundamental image quality and generalizability over scan conditions, offering valuable feedback for improving machine learning-based methods development.

Introduction

Machine learning models are widely investigated for magnetic resonance imaging (MRI) reconstruction, covering under-sampled k-space completion, image artifact removal, and direct transform from k-space to image domain(1). Traditional image quality metrics such as PSNR, SSIM, RMSE fail to directly capture critical MR quality aspects such as image resolution, homogeneity, and low-contrast detectability. Inspired by the MRI American College of Radiology (ACR) phantoms(2), we develop a pipeline to create digital image quality (IQ) phantom MR data with automated evaluation algorithms. These phantoms and evaluation algorithms provide a useful tool for assessing machine learning MR reconstruction methods, offering a better understanding of their fundamental image quality performance and feedback for model improvement.

Methods

As shown in Figure 1, we create the digital IQ phantoms in continuous k-space to mimic the MR data acquisition process(3). Our initial design contains three phantoms: 1) a simple disk phantom: created with its continuous k-space function, Jinc function, where the radius and center are adjustable to simulate various objects. 2) a resolution phantom: two 4x4 arrays of small disk phantoms are combined to quantify resolution in both directions. 3) a low contrast phantom: generated by overlaying the small disks onto a large disk simulating contrast value of 40%. In this experiment, we generated 1000 phantom images for each category as test sets, with key parameters FOV=240mmx240mm and matrix size=64x64. Complex Gaussian noise with standard deviation of 0.05 and 5e-5 was added to the k-space before calculating the reference image with discrete inverse Fast Fourier transform (iFFT) to simulate high noise and low noise scenarios.

Geometry accuracy is assessed by detecting disk radii and compute their deviation from the ground truth. Intensity uniformity is determined by the difference between the 99th and 1st percentiles within the disk phantom. The high-contrast image resolution is quantified by the number of lines across disk centers that have all four peaks fully separable at radii 1mm to 5mm. Finally, low-contrast detectability is quantified by the number of complete spokes detectable by 2D correlation with disk templates. Our phantom evaluation process is summarized in Figure 2.

For demonstration purposes, we employed an example AUTOMAP end-to-end MR reconstruction model(4,5), trained with k-space input from two public brain datasets separately, M4Raw (6) at 0.3T and a subset of 3T brain FastMRI(7). Both datasets were split into training, validation, and test sets. The k-space data from each slice and each coil were treated as separate 2D images, creating a training set of 70000 images. The image target was computed by the iFFT of the cropped k-space data. The model and training datasets are illustrated in Figure 3.

Results

Figure 4 displays example digital IQ phantom data with high and low noise levels, example low-field M4raw test sets and their test reconstruction results from the two networks. Both networks restored the shape of the disk phantom but exhibited visible intensity variations in the low noise level cases. The brain images reconstructed by the two networks present similar visual quality.

Figure 5 illustrates quantitative results for phantom-based evaluation across all 1000 phantoms. Both networks presented geometric accuracy similar to the reference reconstruction (iFFT). A larger intensity variation in both models was observed for low noise phantom data compared to the reference. However, the intensity uniformity was better than reference for high-noise phantom, suggesting noise suppression in both models. The high-contrast resolution results indicate good resolution for radii greater than 3mm but deteriorate for smaller radii, consistent to the pixel size of 3.75mm. Low-contrast detectability was high for the reference reconstruction and declined for both models in both high and low noise situations.

Discussion

Our preliminary results reveal that geometric distortion is not a significant concern for the evaluated models. The marginal lower performance of intensity uniformity and low-contrast detectability of FastMRI-trained network could be due to the smaller spatial coil coverage of the training set. Coil-combined images as training data may be used to address this issue, which we will explore next to achieve better performed reconstruction models. The digital IQ phantoms provide valuable insights into the fundamental image performance of the AI Models. Since variations in training datasets impact model performance, this study also highlights the importance of assessing model generalizability across different datasets.

Conclusion

The evaluation framework using digital IQ phantoms for MRI reconstruction models offer a valuable tool to assess the performance of machine learning-based MRI reconstruction and provides insights into model generalizability. Future research direction includes tissue property and acquisition sequence parameter simulation for generalizability evaluation across field strengths, image contrasts, and under-sampling for scan acceleration.

Acknowledgements

Fei Tan acknowledges funding by appointment to the Research Participation Program at the Center for Devices and Radiological Health administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the US Department of Energy and the US Food and Drug Administration.

References

1. Montalt-Tordera J, Muthurangu V, Hauptmann A, Steeden JA. Machine learning in Magnetic Resonance Imaging: Image reconstruction. Phys Med 2021;83:79-87.
2. Phantom Test Guidance for Use of the Large MRI Phantom for the ACR.
3. Wulkerr C, Gessert NT, Doneva M, Kastryulin S, Ercan E, Nielsen T. Digital reference objects for evaluating algorithm performance in MR image formation. Magn Reson Imaging 2023.
4. Koonjoo N, Zhu B, Bagnall GC, Bhutto D, Rosen MS. Boosting the signal-to-noise of low-field MRI with deep learning image reconstruction. Sci Rep 2021;11(1):8248.
5. Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature 2018;555(7697):487-492.
6. Lyu M, Mei L, Huang S, Liu S, Li Y, Yang K, Liu Y, Dong Y, Dong L, Wu EX. M4Raw: A multi-contrast, multi-repetition, multi-channel MRI k-space dataset for low-field MRI research. Sci Data 2023;10(1):264.
7. Muckley MJ, Riemenschneider B, Radmanesh A, Kim S, Jeong G, Ko J, Jun Y, Shin H, Hwang D, Mostapha M, Arberet S, Nickel D, Ramzi Z, Ciuciu P, Starck JL, Teuwen J, Karkalousos D, Zhang C, Sriram A, Huang Z, Yakubova N, Lui YW, Knoll F. Results of the 2020 fastMRI Challenge for Machine Learning MR Image Reconstruction. IEEE Trans Med Imaging 2021;40(9):2306-2317.

Figures

Figure 1: Digital phantom simulation pipeline. We create the digital IQ phantoms in continuous k-space to mimic the MR data acquisition process. FOV, radius, center locations are specified by physical unit (millimeter) in the continuous k-space function. Spatial shifts of the phantoms were created by corresponding phase shift in k-space.


Figure 2: Illustration of image quality evaluation process. Geometry accuracy is calculated by the percentage radius error. Intensity uniformity is defined by the intensity difference between the 99th and 1st percentiles within the disk. Resolution is evaluated by peak separability. Low-contrast detectability is quantified by the number of complete spokes detectable using 2D correlation with disk templates on the right. The red dots illustrate detected low-contrast disk locations.


Figure 3: Example MR reconstruction network AUTOMAP and the two models. Both models used the same hyperparameters as the original AUTOMAP structure and were trained on M4Raw and FastMRI 3T brain datasets, respectively. Both raw k-space data were cropped to a 64x64 matrix size to fit the 8GB GPU memory, and random circular spatial shifts in the image were applied by phase shift in the k-space to promote translational invariance. Note that the phantom dataset was not included in training.


Figure 4: Representative examples of M4Raw brain test set, digital phantoms, and their outputs from two AUTOMAP models. AUTOMAP model 1 was trained on low-field M4Raw datasets and AUTOMAP model 2 was trained on 3T FastMRI brain datasets. Both models were tested on M4Raw test set and digital phantoms.


Figure 5: Boxplots of the phantom-based evaluation results across all 1000 phantoms at low noise (σ=5e-5) and high noise (σ=0.05) levels.


Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)
0350
DOI: https://doi.org/10.58530/2024/0350