Arjun D Desai1,2, Andrew M Schmidt2, Elka B Rubin2, Christopher M Sandino1, Marianne S Black2, Valentina Mazzoli2, Kathryn J Stevens2, Robert Boutin2, Christopher Ré3, Garry E Gold2,4, Brian A Hargreaves1,2,4, and Akshay S Chaudhari2,5
1Electrical Engineering, Stanford University, Stanford, CA, United States, 2Radiology, Stanford University, Stanford, CA, United States, 3Computer Science, Stanford University, Stanford, CA, United States, 4Bioengineering, Stanford University, Stanford, CA, United States, 5Biomedical Data Science, Stanford University, Stanford, CA, United States
Synopsis
While deep-learning-based MRI reconstruction and image analysis methods have shown promise, few have been translated to clinical practice. This may be a result of (1) paucity of end-to-end datasets that enable comprehensive evaluation from reconstruction to analysis and (2) discordance between conventional validation metrics and clinically useful endpoints. Here, we present the Stanford Knee MRI with Multi-Task Evaluation (SKM-TEA), a dataset of 155 clinical quantitative 3D knee MRI scans with k-space data, DICOM images, and dense tissue segmentation and pathology annotations to facilitate clinically relevant, comprehensive benchmarking of the MRI workflow. Dataset, code, and trained baselines are available at https://github.com/StanfordMIMI/skm-tea.
Introduction
Despite the extensive deep learning (DL) research in accelerated MRI reconstruction and automated image analysis (e.g. segmentation, pathology detection), few methods have been deployed prospectively in clinical practice
1. Existing datasets are designed to handle single tasks (e.g. only MRI reconstruction), making it impossible to evaluate the end-to-end reconstruction to image analysis workflow
2–4. Additionally, current image quality and region-of-interest (ROI) metrics used to benchmark these techniques are discordant with clinically relevant endpoints
5,6.
To mitigate this challenge, we present the
Stanford Knee MRI with Multi-Task Evaluation (SKM-TEA) dataset, a collection of quantitative knee MRI (qMRI) scans that enables end-to-end, clinically relevant evaluation of MRI reconstruction, segmentation, and detection methods (Fig.1). This 1.6TB dataset consists of k-space measurements of anonymized patient MRI scans, DICOM images, manual segmentations of four tissues, and bounding box annotations for sixteen different pathologies. Using the reconstruction and analysis tasks, we introduce a framework for using T
2 parameter maps as a new metric for measuring the quality of clinically relevant qMRI biomarker estimates. Dataset, code, and state-of-the-art trained baselines are available at
https://github.com/StanfordMIMI/skm-tea.
The SKM-TEA Dataset
Collection Overview: 155 clinical patients received a 3T knee MRI (GE MR750) scan with a sagittal 3D quantitative double echo steady state (qDESS) sequence with informed consent and IRB approval7–11.
Collected Data: Multi-coil k-space data was acquired with 2x1 parallel imaging with elliptical sampling. Unsampled k-space data was synthesized using Autocalibrating Reconstruction for Cartesian imaging (GE Orchestra) and was considered to be the fully-sampled k-space. Scanner-generated magnitude DICOM images were also collected. The qDESS scans were used to analytically compute cartilage and meniscus T2 parameter maps, which are sensitive to physiological changes due to factors including aging, trauma and early osteoarthritis12.
Annotations: Manual segmentations were performed for femoral, tibial, and patellar cartilage, and the meniscus. Board-certified radiologist reports the described sixteen pathological categories across joint effusion, meniscal and ligament tears, and cartilage lesions were translated into localized 3D bounding boxes.
Challenge Tracks: The SKM-TEA dataset enables two tracks with multi-task evaluation: (1) the Raw Data Track for all tasks and (2) the DICOM Track for image analysis tasks (Fig.1).Methods
We assessed the utility of SKM-TEA by (1) incorporating diagnostically relevant qMRI analysis as a standardized evaluation metric and (2) benchmarking state-of-the-art DL-based reconstruction and segmentation models.
T2-based qMRI Evaluation: We propose a qMRI-based evaluation benchmark that compares regional T2 accuracy between ground-truth and model-reconstruction-based T2 parameter maps and segmentations13,14 (Fig.2). For reproducibility, all T2 parameter map estimation and tissue sub-region division is performed with the open-source DOSMA framework15,16.
Benchmarks: U-Net and unrolled networks were trained to reconstruct 2D Poisson Disc undersampled axial slices for the two qDESS echoes (E1, E2) at 6x/8x accelerations17–19. Models were trained with different input configurations: two separate models for E1 and E2 (E1/E2); a single model for both echoes, with each echo a unique training example (E1+E2); (3) a single model for both echoes, with E1 and E2 as multiple channels (E1$$$\oplus$$$E2). For segmentation, V-Net and U-Net models were trained on DICOM images to segment all four tissues20. Separate models were trained on E1-only, E2-only, multi-channel E1-E2 (E1$$$\oplus$$$E2), and the root-sum-of-squares (RSS) of the two echoes.
Metrics: In addition to the T2 qMRI metric, reconstruction and segmentation performance was measured with image quality (structural similarity [SSIM], peak signal-to-noise ratio [pSNR]) and ROI (Dice, average symmetric surface distance [ASSD]) metrics, respectively. Concordance between the qMRI metric and reconstruction and segmentation metrics was measured with Pearson’s correlation coefficient ($$$\rho$$$).Results
Unrolled reconstruction models outperformed U-Net at both accelerations and across both echoes (Table 1). There was a negligible performance difference with different input configurations amongst same architectures. U-Net (E1+E2) had the least bias for patellar and tibial cartilage T2 estimates at 6x and for patellar cartilage at 8x. However, U-Net approaches had higher variance (>1.0ms) in these estimates compared to unrolled models.
E1, E1$$$\oplus$$$E2, and RSS segmentation models had the highest performance and consistently outperformed E2 on both ROI and T2 accuracy metrics (Table 2). These methods also had higher variance (>0.6ms), which may indicate higher variability in the estimates despite low bias. V-Net models achieved higher performance compared to U-Net models among standard ML segmentation metrics, but had similar performance among T2 error metrics.
Reconstruction (SSIM, pSNR) and segmentation metrics (DSC, ASSD) had very weak correlation with absolute T2 error across all four tissues ($$$\rho \leq$$$0.52 and 0.4, respectively [Fig.3]).Discussion
While standard image quality metrics indicated superior performance of unrolled reconstructions, T2 errors of these models were more variable. Among segmentation models, low SNR and contrast of E2 compared to E1 may result in higher variability in segmentations and T2 quantification.
The discordance between relative differences for ML and qMRI metrics may suggest that standard ML metrics may not represent true clinical utility. Thus, using diagnostically relevant T2 biomarkers as direct endpoints for quantifying performance may help mitigate this challenge.Conclusion
In this work, we introduce SKM-TEA, a quantitative knee MRI dataset that enables comprehensive, clinically relevant benchmarking of the end-to-end MRI workflow. We hope the SKM-TEA dataset and open-source code can enable a broad spectrum of research for modular image reconstruction and image analysis in a clinically informed manner.Acknowledgements
Research support provided by NIH R01 AR077604, NIH R01 EB002524, NIH K24 AR062068, NSF-GRFP 1656518, DOD-NDSEG ARO, Precision Health and Integrated Diagnostics Seed Grant from Stanford University, Stanford Artificial Intelligence in Medicine and Imaging GCP grant, Stanford Human-Centered Artificial Intelligence GCP grant, GE Healthcare, and Philips.References
1. Chaudhari AS, Sandino CM, Cole EK, et al. Prospective Deployment of Deep Learning in MRI: A Framework for Important Considerations, Challenges, and Recommendations for Best Practices. J Magn Reson Imaging. 2021;54(2):357-371. doi:10.1002/jmri.27331
2. Zbontar J, Knoll F, Sriram A, et al. fastMRI: An Open Dataset and Benchmarks for Accelerated MRI. 2018:1-29. http://arxiv.org/abs/1811.08839.
3. Ong F, Amin S, Vasanawala S, Lustig M. Mridata. org: An open archive for sharing MRI raw data. In: Proc. Intl. Soc. Mag. Reson. Med. Vol 26. ; 2018:1.
4. Peterfy CG, Schneider E, Nevitt M. The osteoarthritis initiative: report on the design rationale for the magnetic resonance imaging protocol for the knee. Osteoarthr Cartil. 2008;16(12):1433-1441. doi:10.1016/j.joca.2008.06.016
5. Knoll F, Zbontar J, Sriram A, et al. {fastMRI}: A Publicly Available Raw k-Space and {DICOM} Dataset of Knee Images for Accelerated {MR} Image Reconstruction Using Machine Learning. Radiol Artif Intell. 2020;2(1):e190007. doi:10.1148/ryai.2020190007
6. Desai AD, Caliva F, Iriondo C, et al. The international workshop on osteoarthritis imaging knee MRI segmentation challenge: a multi-institute evaluation and analysis framework on a standardized dataset. Radiol Artif Intell. 2021;3(3):e200078.
7. Chaudhari AS, Stevens KJ, Sveinsson B, et al. Combined 5‐minute double‐echo in steady‐state with separated echoes and 2‐minute proton‐density‐weighted 2D FSE sequence for comprehensive whole‐joint knee MRI assessment. J Magn Reson Imaging. 2019;49(7):e183-e194. doi:10.1002/jmri.26582
8. Chaudhari AS, Grissom MJ, Fang Z, et al. Diagnostic Accuracy of Quantitative Multicontrast 5-Minute Knee {MRI} Using Prospective Artificial Intelligence Image Quality Enhancement. Am J Roentgenol. 2021;216(6):1614-1625. doi:10.2214/ajr.20.24172
9. Chaudhari AS, Sveinsson B, Moran CJ, et al. Imaging and T2 relaxometry of short-T2 connective tissues in the knee using ultrashort echo-time double-echo steady-state (UTEDESS). Magn Reson Med. 2017;78(6):2136-2148. doi:10.1002/mrm.26577
10. Eijgenraam SM, Chaudhari AS, Reijman M, et al. Time-saving opportunities in knee osteoarthritis: T2 mapping and structural imaging of the knee using a single 5-min MRI scan. Eur Radiol. December 2019. doi:10.1007/s00330-019-06542-9
11. Sveinsson B, Chaudhari AS, Gold GE, Hargreaves BA. A simple analytic method for estimating T2 in the knee from DESS. Magn Reson Med. 2017;38:63-70. doi:10.1016/j.mri.2016.12.018
12. Baum T, Joseph GB, Karampinos DC, Jungmann PM, Link TM, Bauer JS. Cartilage and meniscal T2 relaxation time as non-invasive biomarker for knee osteoarthritis and cartilage repair procedures. Osteoarthr Cartil. 2013;21(10):1474-1484. doi:10.1016/j.joca.2013.07.012
13. Monu UD, Jordan CD, Samuelson BL, Hargreaves BA, Gold GE, McWalter EJ. Cluster analysis of quantitative MRI T2 and T1$ρ$ relaxation times of cartilage identifies differences between healthy and ACL-injured individuals at 3T. Osteoarthr Cartil. 2017;25(4):513-520. doi:10.1016/j.joca.2016.09.015
14. Crowder HA, Mazzoli V, Black MS, et al. Characterizing the transient response of knee cartilage to running: Decreases in cartilage T2 of female recreational runners. J Orthop Res. 2021.
15. Desai AD, Barbieri M, Mazzoli V, et al. DOSMA: A deep-learning, open-source framework for musculoskeletal MRI analysis. In: Proc. Intl. Soc. Mag. Reson. Med. Vol 27. ; 2019.
16. Desai A, Chaudhari A, Barbieri M. ad12/DOSMA. December 2019. doi:10.5281/zenodo.2559548
17. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2015;9351:1-8. doi:10.1007/978-3-319-24574-4_28
18. Sandino CM, Cheng JY, Chen F, Mardani M, Pauly JM, Vasanawala SS. Compressed Sensing: From Research to Clinical Practice with Deep Neural Networks: Shortening Scan Times for Magnetic Resonance Imaging. IEEE Signal Process Mag. 2020;37(1):117-127. doi:10.1109/MSP.2019.2950433
19. Diamond S, Sitzmann V, Heide Gordon Wetzstein F, Heide F, Wetzstein G. Unrolled optimization with deep priors. arXiv Prepr arXiv170508041. 2017.
20. Milletari F, Navab N, Ahmadi SA. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016. Institute of Electrical and Electronics Engineers Inc.; 2016:565-571. doi:10.1109/3DV.2016.79