Keywords: Analysis/Processing, Machine Learning/Artificial Intelligence
Motivation: Morphometric assessment of cartilage(e.g.,thickness), through MRI yields accurate measurements on the progression of Osteoarthritis(OA). Such quantitative measurements require image segmentation techniques. Recent developments in Visual Foundational Models(VFM) bring opportunities to increasing generality and robustness.
Goal(s): What improvements can VFM-based approaches bring to automatic segmentation of knee 3DMRIs, and how it compares to traditional convolution networks(CNNs)?
Approach: Trained 2DVFM, 3DCNN, and a modified 3DVFM on 500MRI volumes. Evaluated qualitative and quantitatively on external datasets.
Results: The proposed 3D-VFM, demonstrates a slight advantage on quantitative morphological assessment, but strongly outperforms others when qualitatively assessed by radiologists, presenting a promising direction and better generalization.
Impact: By leveraging Visual Foundational Models (VFM) in the morphometric assessment of cartilage through 3D MRIs, our research demonstrates significant promise in enhancing the accuracy and generalization of knee segmentation to be applied to osteoarthritis progression measurements.
1. F. Milletari, N. Navab, S. Ahmadi. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. arXiv: 1606.04797. 2016
2. B. A. A. Nunes, I. Flament, N. Namiri, R. Shah, U. Bharadwaj, T. Link, M. Bucknor, V. Pedoia, S. Majumdar, “Automatic Deep Learning Assisted Detection and Grading of Abnormalities in Knee MRI Studies”, Radiology: AI, Jan 2021, doi: 10.1148/ryai.2021200165.
3. A. Kirillov, et al. "Segment anything." arXiv preprint arXiv:2304.02643 (2023).
4. J. Wu, et al. "Medical sam adapter: Adapting segment anything model for medical image segmentation." arXiv preprint arXiv:2304.12620 (2023).
5. V. Mudiyam, D. Sundaran, J. Dholakia, M. Fung, “An Automated Technique To Estimate Knee Cartilage Thickness”, ESSR2023, https://www.essr2023.org/formulario-comunicaciones/posters/6571cdc598745728a64fdf67fb038ed2.pdf
6. McHugh, M.L., “Interrater reliability: the kappa statistic”. BiochemMed(Zagreb) 2012;22(3):276–282.
Figure 1:(A) Dice scores and IoU metrics, for 7 cases, computed between AI models and both manual segmentations independently annotated by radiologists, RadA and RadB, from distinct clinical institutions. VFM-based architectures seem to consistently produce slightly better results when compared to the 3DVnet. However, these differences between models are not significant according to MWU-two-sided test (p-values not shown. Highest p-value computed between metrics was p=0.136>0.05). (B) Interobserver variability: Dice and IoU computed between RadA and RadB.
Figure 2: Two radiologists, Rad1 and Rad2, independently and blindly compared 3 AI-driven segmentations for each case (N=20), electing best, middle, and worst segmentations. SaMRI3D was ranked the best, 15 and 14 times respectively. SaMRI2D was ranked the worst model in 13 and 12 times. Overall agreement was substantial (Cohen’sKappa=0.65). Radiologists agreed on the same ranking for 13 cases; agreed on the best but disagreed on ranking middle and worst on 4 cases; agreed on worst but disagreed on best and middle on 2 cases; and completely disagreed on best and worst only for 1 case.
Figure 3: All models trained on fat saturated (FS=1.0) images. This case within the qualitative testset had 3 sequences with different FS settings (left column). Each AI model was inferred on all sequences and the 3D rendering of each output is displayed in the 3 right columns. Yellow = menisci, blue = patellar cartilage, red = femoral, and green = tibial. SaMRI2D, suffers from spurious segmentations possibly due to lack of 3D spatial correlation. The Vnet does not suffer from it, but SaMRI3D appears to generalize better and be more robust to changing FS contrast.
Figure 4: (A) Average cartilage thickness computed based on automated cartilage segmentations and based on two manual segmentations from 2 radiologists, RadA and RadB, for 7 cases. (B) Absolute difference between cartilage thickness computed based on AI models, and RadA and RadB manual segmentations. P-values imply that the difference in the distribution of cartilage thickness values between each model and the radiologists in not significant. (C) Interobserver variability: Absolute difference computed based on the manual segmentations between RadA and RadB.