0244

Deep-learning Diagnosis of Supraspinatus Tendon Tears: Comparison of Multi-sequence Versus Single Sequence Input
Dana J. Lin, MD1, JinHyeong Park, PhD2, Michael Schwier, PhD2, Bernhard Geiger, PhD2, Esther Raithel, PhD3, and Michael P. Recht, MD1
1Department of Radiology, NYU School of Medicine, New York, NY, United States, 2Siemens Healthineers, Princeton, NJ, United States, 3Siemens Healthineers, Erlangen, Germany

Synopsis

Rotator cuff tears are a common cause of shoulder pain and typically diagnosed on shoulder MRI. Using 1,218 MR examinations performed at multiple field strengths and from multiple vendors, we developed a deep-learning (DL) model for the diagnosis of supraspinatus tendon tears on MRI using an ensemble of 3D ResNets combined via logistic regression to classify tears into no tear, partial tear, and full-thickness tear. We compared the effect of using multiple sequences as input versus a single sequence. Our results show that deep-learning diagnosis of supraspinatus tendon tears is feasible and that multi-sequence input improves model performance.

Introduction

Rotator cuff tears are a common cause of shoulder disability, affecting nearly 40% of individuals older than 60 years and over half of individuals older than 80 years.1 MRI is widely used for rotator cuff tear detection and characterization to help guide surgical versus non-surgical management. Radiologist interpretation of rotator cuff tears can be time-consuming and consists of tear classification (partial versus full-thickness) for each tendon as well as quantitative measurements and qualitative descriptions of muscle and tendon status.2

DL has the potential to aid radiologists by increasing both their accuracy3 and productivity. Although all tendons of the rotator cuff may tear, the supraspinatus tendon is the most commonly torn tendon. The purpose of this study was to develop a DL model for the diagnosis of supraspinatus tendon tears on shoulder MRI. In addition, we compared the effect of using a single MRI sequence versus multiple sequences as input for the DL model.

Methods

This was a HIPAA-compliant, IRB-approved study. The dataset consisted of 1,218 MRIs obtained on Philips, Siemens, GE, Toshiba, and Hitachi scanners at 0.7, 1.5 or 3T. Each examination consisted of coronal, sagittal, and axial fluid-sensitive (FS) (T2-weighted fat-suppressed, proton density weighted fat-suppressed, or STIR) plus sagittal T1-weighted sequences (Figure 2). Radiology reports were used as ground truth for tear classification, which was defined as no tear, partial tear, or full-thickness tear.

The data was split into 972/246 for training/testing, limiting the test cases to 1.5 and 3T only, due to the small number of 0.7T cases. Distribution of tear types and magnetic field strengths on the training and test data are shown in Figure 1. All images were preprocessed by linear scaling of the intensity values to the range [0,1]. All left shoulder images were flipped to simulate right shoulder views. Images were then resampled to a fixed image size of 128, 128, 32 (x,y,z) using trilinear interpolation.

The classification stage comprised 4 parallel 3D ResNet50 CNN architectures4 followed by a logistic regression (Figure 3). The ResNet50 models were trained via transfer learning.5 They were initialized with weights pretrained on the Kinetics dataset6,7 and then further trained on the 972 training cases to adapt the models to the targeted domain. Each of the ResNet50 models was independently trained on a particular view out of the four views. Training was set up with Adam optimization, cross-entropy loss, and a learning rate of 10-5. A logistic regression was trained to weight the independent per-view predictions of the 4 models to combine them into the final prediction.

A receiver operating characteristic (ROC) analysis was performed to assess the performance of the full classification pipeline versus using each MRI sequence alone, and the influence of magnetic field strength on the classification results.

Results

The area under the curve (AUC) for the multi-sequence input was 0.90, which was higher than each of the individual single sequences tested (0.81-0.85). Figure 4 shows the comparison between the multi-sequence and single sequence input. Of the single sequence input, the coronal FS sequence demonstrated the highest AUC. Among the different tear types, the model performed better for full-thickness tears, followed by no tears and partial-thickness tears (Figure 5). The model performed better at 3T versus 1.5T, with AUC 0.91 versus 0.90, respectively.

Discussion

We have demonstrated that deep-learning diagnosis of supraspinatus tendon tears is feasible with reasonable diagnostic performance. As with human readers, model performance improved with the use of multiple sequences compared to a single sequence. Although this confirms our hypothesis, the initial rationale to investigate whether the model could diagnose tendon tears on a single sequence was based on the idea that if the model could in fact perform well with a single sequence alone, this would contribute further to simplification of the classification task and potentially decrease image acquisition time as well. Our results demonstrate that multi-sequence image acquisition is needed for adequate model performance.

The comparison among the single sequence inputs shows that the model performs best with the coronal FS sequence. This is in concordance with clinical practice, where radiologists rely heavily on this sequence to assess the integrity of the supraspinatus tendon. Interestingly, the model performs slightly better with the axial FS sequence when compared to the sagittal FS sequence. Typically, the sagittal FS sequence would be considered by most radiologists to be the other sequence they rely upon for accurate diagnosis of supraspinatus tendon tears, apart from the coronal FS sequence.

The comparison among tear types shows that the model performs best with full-thickness tears, followed by no tears and partial-thickness tears. The lower performance for partial-thickness tears for both the model as well as human readers corroborates the known challenges and reader variability for partial tears. The superior performance of the model for full-thickness tears compared to no tears may be due to the discrete fluid signal that can be more easily identified with full-thickness tears, whereas patients without tears may have varying degrees of signal abnormality and heterogeneity due to varying degrees of tendinosis.

Conclusion

Deep-learning diagnosis of supraspinatus tendon tears is feasible. Multi-sequence input improves tear classification compared to single-sequence.

Acknowledgements

We acknowledge Heiko Meyer for his valuable input and support for this project. We acknowledge Shivam Kausik for annotation/labeling assistance.

References

1. Tashjian RZ. Epidemiology, natural history, and indications for treatment of rotator cuff tears. Clin Sports Med. 2012;31(4): 589-604.

2. Morag Y, Jacobson JA, Miller B, et al. MR imaging of rotator cuff injury: what the clinician needs to know. Radiographics. 2006;26(4):1045-65.

3. Bien N, Rajpurkar P, Ball RL, et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLOS Medicine. 2018;15(11): e1002699. https://doi.org/10.1371/journal.pmed.1002699

4. He K, Zhang X, Ren S, et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

5. Yosinski J, Clune J, Bengio Y, et al. (2014). How transferable are features in deep neural networks?. In Advances in neural information processing systems (pp. 3320-3328).

6. Hara K, Kataoka H, Satoh Y. (2018). Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6546-6555).

7. Kay W, Carreira J, Simonyan K, et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

Figures

Figure 1. (a) composition of tear types within the training and test sets. (b) distribution of magnetic field strength across the training and test sets.

Figure 2. Coronal T2-weighted fat suppressed (T2FS) (TR 5000 ms, TE 51 ms) (a) and axial STIR (TR 4180 ms, TE 32 ms, TI 150 ms) (b) images show a full-thickness, retracted supraspinatus tendon tear (white arrows). Sagittal T2FS (TR 5460 ms, TE 57 ms) (c) image shows the full-thickness tear with absent supraspinatus/infraspinatus tendon fibers (white bracket) and a few intact tendon fibers (black arrow). Sagittal T1-weighted (TR 465 ms, TE 11 ms) (d) image shows retracted supraspinatus (white arrow) and infraspinatus (blue arrow) tendon fibers with partially imaged muscle atrophy.

Figure 3. Multi-sequence-input architecture using an ensemble of 3D ResNet50 convolutional neural networks combined via a logistic regression.

Figure 4. ROC analysis of multi-sequence versus single sequence input by type of sequence.

Figure 5. ROC analysis of model performance on classifying tear types using multi-sequence input.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
0244