0650

Foundation Model based labelling of MR Shoulder images to drive Auto-Localizer workflow

Gurunath Reddy M¹, Muhan Shao², Deepa Anand¹, Kavitha Manickam³, Dawei Gui³, Chitresh Bhushan², and Dattesh Shanbhag¹
¹GE HealthCare, Bangalore, India, ²GE HealthCare, Niskayuna, NY, United States, ³GE HealthCare, Waukesha, WI, United States

Synopsis

Keywords: Other AI/ML, Machine Learning/Artificial Intelligence, One-shot, Shoulder, Foundation Models, Localization, Segmentation

Motivation: Develop automatic labelling capability on anatomical shoulder MRI images with minimal manual annotation.

Goal(s): Leverage large-FOV, low resolution coil sensitivity maps to guide correct positioning of three-plane localizer for shoulder MRI planning.

Approach: Use chained DINO-V2 and SAM foundation models, tuned to MRI localizers and a data driven similarity measure to label shoulder data at scale and transfer to low resolution coil sensitivity maps for CNN model training.

Results: Excellent shoulder region localization with FM on anatomical (91% accuracy) and with CNN model on calibration data (error < 15 mm)

Impact: A data adaptive, chained foundation model-based approach for annotating shoulder regions on MRI anatomical images at scale is shown. This allowed rapid development of model using low-resolution calibration data for correctly positioning three-plane localizer for shoulder anatomical planning and imaging.

Introduction

To handle off-center MRI imaging, a DL based methodology was introduced to use large FOV coil sensitivity maps (or calibration data) to correctly position localizers[1]. While this method is attractive for use in shoulder, it is challenging to get large number of labeled data for DL model development. A Recent method utilizing DINO V2 and Segment Anything (SAM) foundation models demonstrated good localization across several anatomies [2]. However, this method heuristically thresholds the correlation of DINO-V2 features between template mask and new test data to generate landmark correspondences. This can be confounding in MRI data due variations in MRI signal (intra-protocol variations, RF bias shading etc.). To overcome this, we introduce a novel contrastive learning-based methodology which allows for feature similarity to be driven using small amount of task data itself. This simplifies labelling process for shoulder region at scale and its usage for developing shoulder localization on calibration data for auto-localizer setup. Results for shoulder localization with this approach are presented.

Methods

Subjects: A. Shoulder MRI: 68 shoulder MRI data with three-plane localizers and matching calibration data, internal volunteers, wide body habitus (weight = 53 kg to 108 kg), 1.5T (N = 21) and 3T( N=57) scanners, variety (14) of shoulder coils. B. All Localizer data: Total of 12700 three-plane localizers across different anatomies, resolutions, and protocols (SSFSE, fGRE) were sequestered from internal database for training DINO V2 FM.
MRI Scanner and Acquisition: Calibration Sensitivity maps: 3D EFGRE sagittal calibration scan, TE/TR = 0.5 ms/1.4 ms , FA = 1°, averages = 2, in-plane resolution = 7.8mm x 7.8mm, slice thickness = 7.8 mm, acquisition matrix = 32x32x32, reconstruction matrix = 64x64x64. Localizer Data: three-plane SSFSE localizers, TE/TR = ~87 ms/975 ms , FA = 90°, in-plane resolution 0.5 mm to 0.8 mm, Slice Thickness = 10 mm, slices = 5-7 per orientation.
Ground-truth (GT) marking: GT masks created by manually annotating 15 MR shoulder volumes in all three planes.
Annotation DL Data Preparation: Contrastive similarity metric learning model is trained with ten MR GT localizer images for three-plane localizers.
Contrastive Learning based Thresholding approach: Compared to [1] which uses natural image features, we retrained DINO V2 model on MR localizer data to obtain patch level features further interpolated at pixel level to obtain 1024-dimensional feature vector. A regular contrastive learning model architecture [3] is used in this work [Fig.1]. During inference (Fig.2), we compare template mask feature vector with all pixel feature vectors in target image by using proposed contrastive similarity model and generate a binary mask of localized points and chained to promptable SAM for obtaining complete shoulder region segmentation [1] (Fig 2). We also implemented method described in Ref.[1]. We used a heuristic threshold = 0.6 for obtaining DINO V2 localization.
Annotation DL Accuracy Assessment: Localization performance measure is computed by average number of localization regions that fall inside ground truth segmentation masks. Segmentation performance is computed by intersection over union (IoU) metric.
Calibration Localization DL: We transferred FM based labeled shoulder regions on calibration data and leveraged approach in [1] to train a CNN model for generating localization mask on calibration data. Train/Validation/Test split was: 49/5/14 subjects.
Calibration Shoulder Localization Evaluation: We evaluated performance of calibration localization by computing centroid error between GT and DL Prediction in 3D and in-plane. Since shoulder FOV is ~ 60 mmx 60 mmx 150mm, 3D localization error < 15 mm and in-plane error < 6 mm was considered accurate for AutoLocalzier setup purposes.

Results and Discussion

Compared to heuristic approach, our approach improves localization accuracy (91.5%) and segmentation performance (IoU = 86.6%) for shoulder region by eliminating most of false positives and negatives [Table 1, Figure-3] and improves performance of SAM based shoulder region segmentation. Fig.4 demonstrates necessity for completing segmentation with SAM model. Figure 5 shows results of a calibration data-based CNN model trained with these annotations. We notice good localization capabilities (mean error < 15 mm) along with reasonable orientation for shoulder region. This should enable auto-localizer for shoulder region automatically as part of scan workflow and eliminate rescans for same.

Conclusion

We have demonstrated an end-to-end pipeline for automatic labelling of shoulder region with MRI based foundation models and only a few labeled templates images. Use of data driven similarity measure instead of manually chosen threshold is necessary in MRI data due to intensity variations across patients, protocols and scanners. This allowed us to rapidly develop a model using low-resolution calibration data for correctly positioning three-plane localizer for shoulder anatomical planning and imaging.

Acknowledgements

No acknowledgement found.

References

[1]. Kavitha Manickam et.al, Intelligent automatic slice prescription of scout scans of MSK MRI imaging using surface coil sensitivities, Proceedings of ISMRM 2023, p. 1445

[2]. Anand, D., Singhal, V., Shanbhag, D.D., KS, S., Patil, U., Bhushan, C., Manickam, K., Gui, D., Mullick, R., Gopal, A. and Bhatia, P., 2023. One-shot Localization and Segmentation of Medical Images with Foundation Models. arXiv preprint arXiv:2310.18642.1. Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A. and Assran, M., 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193.

[3]. Koch, G., Zemel, R. and Salakhutdinov, R., 2015, July. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2, No. 1).

Figures

Fig.1: Contrastive Metric Learning for Localization

Fig.2: Proposed one-shot Localization and Segmentation Approach

Fig.3: Impact of Heuristic vs data driven thresholding

Fig.4: Impact of DINO V2 vs. DINO V2 chained to SAM

Fig.5: Results of calibration data-based CNN model

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0650

DOI: https://doi.org/10.58530/2024/0650