1008

MR-Transformer: Vision Transformers for Total Knee Replacement Prediction using Magnetic Resonance Imaging

Chaojie Zhang¹, Shengjia Chen¹, Haresh Rengaraj Rajamohan², Kyunghyun Cho², Richard Kijowski³, and Cem M. Deniz^1,3
¹Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, New York University Grossman School of Medicine, New York, NY, United States, ²Center for Data Science, New York University, New York, NY, United States, ³Department of Radiology, New York University Langone Health, New York, NY, United States

Synopsis

Keywords: Diagnosis/Prediction, Data Analysis, Deep Learning

Motivation: Current deep learning methods for assessing knee osteoarthritis have limitations in learning long-range spatial information from magnetic resonance imaging (MRI).

Goal(s): This study aims to develop a new deep learning model for total knee replacement (TKR) prediction using MRI.

Approach: We proposed a novel transformer-based model, MR-Transformer, adapted from the ImageNet pre-trained vision transformer DeiT-Ti. The model can capture long-range spatial information from MR images with transformer architecture. We evaluated our model on TKR prediction using MR images with different tissue contrasts.

Results: The experimental results demonstrated an improved performance of MR-Transformer compared to conventional deep learning models.

Impact: Our proposed MR-Transformer enhances computer-aided diagnosis accuracy in total knee replacement prediction using MRI. It has the potential to provide rapid and quality diagnostic outcomes, assisting physicians in making timely and informed treatment decisions.

Introduction

Osteoarthritis is a degenerative joint disease and the most common reason leading to total knee joint replacement (TKR). It is important to identify whether patients will progress to undergo the TKR before potential disease-modifying therapies can be effectively developed. With the development of artificial intelligence, various deep learning (DL) models have been developed for knee disorders detection using magnetic resonance imaging (MRI). As osteoarthritis is a structural disease that requires global spatial information from MR images for diagnosis, most conventional DL models using convolutional neural networks (CNNs) have limitations in learning long-range spatial information from MR images^1-3. In our study, we proposed MR-Transformer, a novel model adapted from the ImageNet pre-trained vision transformer DeiT-Ti⁴, for TKR prediction using MRI. The model can capture long-range spatial information from MR images with transformer architecture. In addition, the inherited ImageNet pre-trained weights can further improve the model performance in small medical datasets. We evaluated the model on TKR prediction using MR images from the Osteoarthritis Initiative (OAI)⁵ and Multicenter Osteoarthritis Study (MOST)⁶ database. Our transformer-based approach demonstrated an improved performance compared to conventional CNNs.

Method

The OAI⁵ database used for this study contains coronal intermediate-weighted turbo spin-echo (COR IW TSE) and sagittal intermediated-weighted turbo spin-echo with fat suppression (SAG IW TSE FS) MRI knee scans. The MOST⁶ database used for this study contains coronal short-tau inversion recovery (COR STIR) and sagittal proton density fat-saturated (SAG PD FAT SAT) MRI knee scans. Detailed information including the MRI parameters is provided in Table 1. All MR images were standardized as matrix input with 36 centered slices, while MR images containing less than 36 slices were applied with zero padding. For each MRI tissue contrast, six-fold cross-validation was applied during training on a case-control cohort, where cases were identified as individuals who underwent a TKR within nine years (OAI) and seven years (MOST) and controls were individuals who did not.
The proposed MR-Transformer for TKR prediction using MR images is illustrated in Figure 1. It was adapted from the 2D ImageNet pre-trained vision transformer DeiT-Ti⁴, and also inherited the pre-trained model weights. The model was employed as follows: (1) The 3-dimensional (3D) input matrix of the MR image was split into 2-dimensional (2D) patches with the size of 16×16. (2) The pre-trained linear projection layer inherited from DeiT-Ti projected each 2D patch into a 192-dimensional vector. (3) Position embeddings are then added to the encoded vectors to retain the positional information of each 2D patch. To adapt to the 3D MRI input, we replicated the pre-trained 2D position embeddings for each MR slice. In addition, 2D interpolation⁷ was applied to the position embeddings of each MR slice since the MR slice has a higher resolution than the image used for pre-training. (4) The pre-trained transformer from DeiT-Ti encoded the vectors and generated the prediction outcome.
The model was trained using SGD, with a base learning rate of 0.0001, momentum of 0.9, and cosine annealing schedule for 100 epochs. Random data augmentations including scaling, cropping, flip, rotation, and blur, were applied in the model training. We compared the proposed model with other CNN-based deep learning models^1-3 for MRI diagnosis. Receiver operator characteristic curve analysis with areas under the curve (AUC) with 95% confidence intervals (CIs) was used to evaluate the diagnostic performance in TKR prediction on the hold-out test set.

Result

Qualitative TKR prediction comparisons are presented in Table 2. MR-Transformer exhibits leading performance in the TKR prediction tasks using COR IW TSE, COR STIR, and SAG PD FAT SAT MR images. The model outperforms TSE¹, Med3D², and MRNet³ by 8.6%, 22.6%, and 6.5% when using SAG PD FAT SAT MR images for TKR prediction. In the task using SAG IW TSE FS MR images, MR-Transformer shows similar performance compared to the best-performing model.
Attention Rollout⁸ was applied to highlight important regions in the MR images that contribute to the model’s classification decision. Figure 2 presents an MR image from a subject who will undergo a TKR within nine years. The joint region is highlighted to indicate informative regions from the MR image for the predictive task.

Conclusion

This study introduced a novel transformer-based model, MR-Transformer for TKR prediction using knee MR images. The model can capture long-range spatial information from MR images and benefit from the inherited pre-trained weights. The experimental results on TKR prediction tasks using knee MRI with different tissue contrasts demonstrated an improved performance compared to conventional CNNs.

Acknowledgements

This work was supported in part by the NIH R01 AR074453, and was performed under the rubric of the Center for Advanced Imaging Innovation and Research (CAI2R, www.cai2r.net), an NIBIB National Center for Biomedical Imaging and Bioengineering (NIH P41 EB017183).

References

1. Rajamohan, Haresh Rengaraj, et al. "Prediction of total knee replacement using deep learning analysis of knee MRI." Scientific reports 13.1 (2023): 6922.

2. Chen, Sihong, Kai Ma, and Yefeng Zheng. "Med3d: Transfer learning for 3d medical image analysis." arXiv preprint arXiv:1904.00625 (2019).

3. Bien, Nicholas, et al. "Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet." PLoS medicine 15.11 (2018): e1002699.

4. Touvron, Hugo, et al. "Training data-efficient image transformers & distillation through attention." International conference on machine learning. PMLR, 2021.

5. Peterfy, Charles G., Erika Schneider, and M. Nevitt. "The osteoarthritis initiative: report on the design rationale for the magnetic resonance imaging protocol for the knee." Osteoarthritis and cartilage 16.12 (2008): 1433-1441.

6. Segal, Neil A., et al. "The Multicenter Osteoarthritis Study (MOST): opportunities for rehabilitation research." PM & R: the journal of injury, function, and rehabilitation 5.8 (2013).

7. Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

8. Abnar, Samira, and Willem Zuidema. "Quantifying attention flow in transformers." arXiv preprint arXiv:2005.00928 (2020).

Figures

Table 1: MRI parameters and data information.

Figure 1: MR-Transformer architecture.

Table 2: Quantitative comparisons (AUC with 95% CI) of TKR prediction performance in test set from OAI and MOST database.

Figure 2: Attention maps for MR-Transformer interpretation.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1008

DOI: https://doi.org/10.58530/2024/1008