Despite rapid recent advances in convolutional neural networks used for image classification, generalizability of these networks to medical image data has not been thoroughly investigated. In this work, we utilize two networks designed to classify ImageNet natural-image data – Inception-v3 and ResNet-50 – and investigate their performance in classifying meniscal tears on MR examinations of the knee. Using limited segmentation and manual tear identification, slice-wise sensitivity of 0.68 and 0.58 is achieved for the respective networks. Applying the “two-slice-touch” rule, sensitivity is significantly increased, but with concomitant decrease in specificity. Our results support the feasibility of utilizing CNNs for meniscal tear identification.
Clinical reports from an institutional database of knee MR exams were reviewed and 60 exams with meniscal tears were identified. 25 normal exams and 15 exams depicting non-meniscal pathology (ligamentous injuries, fractures) were also identified, and these 100 patients/exams were used to characterize network performance. 15 exams (7 with tears) were designated as test data. For remaining exams, data were randomly divided into training/validation sets on slice-wise/category-wise bases in an 85/15 ratio.
Sagittal T2-FatSat and coronal PD-FatSat images were used for training, as increased contrast from fluid-sensitivity of these sequences enhances detection of meniscal pathology4. Images were acquired per clinical protocol with 512x512 resolution and 2.5mm slice thickness. Using Matlab (Mathworks), images extending between anterior/posterior or medial/lateral meniscal boundaries were segmented into 340x100 windows containing the menisci and tibiofemoral joints. Segmented image slices depicting meniscal tears were identified by a fourth-year radiology resident with reference to clinical reports. All slices extending between tear boundaries were considered as tears, even if intermediate slices did not unequivocally depict tearing. No medial/lateral meniscal discrimination was made.
Training was performed using Keras/Tensorflow libraries5,6 modified to support 16-bit input. Network top layers were modified to support a binary classification task (Figure 1), with 20%-dropout to reduce over-fitting. Weighted cross-entropy loss was used to account for class imbalance. Network-specific Keras preprocessing functions were used. Images were resized using bicubic interpolation to 2992/2242 for Inception/ResNet, respectively. Augmentation consisting of ±3º rotations and horizontal image flips was used during training. Networks were initialized with ImageNet weights or without pre-trained weights (Xavier initialization) to investigate whether classification power of ImageNet weights is translatable to MR images. Training was conducted using an SGD optimizer, learning rate=5·10-3, batch size=32, and early stopping criterion of maximum validation accuracy with patience=5 epochs. Training was performed using an NVIDIA GeForce GPU (NVIDIA Corporation).
Results of training with ImageNet and with Xavier weight initialization are depicted in Figure 2, and model predictions using 15 test cases are presented as confusion matrices in Figure 3. With ImageNet weight initialization, sensitivity of Inception-V3 for tear detection is somewhat greater than ResNet-50 on a slice-wise basis, while specificity is similar between the networks. Without use of pretrained weights, neither network attains good sensitivity.
Trained networks initialized with ImageNet weights were analyzed using the “two-slice-touch” rule7. Results are presented in Figure 4, and demonstrate increased tear detection sensitivity but decreased specificity for both networks.
This preliminary study demonstrates the feasibility of using 2D-CNNs to identify meniscal tears. Although both networks attained reasonable sensitivity when initialized with ImageNet weights, networks initialized without pretrained weights failed to generate useful predictive models. This likely relates to the low amount of training data (4123 images) used to parameterize models with large numbers of free parameters (ResNet 25.6-million, Inception 23.9-million). High validation accuracy but poor test sensitivity using Inception-V3 without pretrained weights indicates the network may be overfitting training data. Interestingly, ResNet-50 initialized without pretrained weights did not achieve high validation accuracy, suggesting training parameters may have been suboptimal. Successful training using ImageNet weight initialization implies a subset of these weights – presumably corresponding to basic image features – may be generalizable to medical image data.
Upon application of the “two-slice-touch” rule, both networks show greater sensitivity for tear detection than suggested by confusion matrices, though with concomitantly decreased specificity. Sensitivity of the Inception-v3 network was greater than for ResNet-50 under both analyses. This may be due to superior feature extraction through Inception’s use of dilated convolutions and larger receptive fields. Alternatively, there may be an element of feature loss with ResNet-50 due to smaller input size.
This study is limited by its use of 2D-CNNs, which treat slices as independent samples despite the fact that most meniscal tears extend over multiple contiguous slices. Recent work8 has demonstrated feasibility of meniscal tear identification using a 3D-CNN for classification. However, most clinical knee exams utilize 2D acquisitions, and development of 2D-tear classification methods is of potentially greater clinical significance.
Future work will investigate whether network performance improves with meniscal segmentation prior to training, and whether supplementation with data of different contrast weighting can improve classification accuracy.
1. Dodge S and Karam L. A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions. arXiv: 1705.02498 [cs], May 2017.
2. Szegedy C, Vanhoucke V, Ioffe S et al. Rethinking the Inception Architecture for Computer Vision. arXiv: 1512.00567 [cs], Dec 2015.
3. He K, Zhang X, Ren S et al. Deep Residual Learning for Image Recognition. arXiv: 1512.03385 [cs], Dec 2015.
4. Nguyen J, De Smet A, Graf B et al. MR Imaging–based Diagnosis and Classification of Meniscal Tears. RadioGraphics 2014;34(4):981-99.
5. Chollet, F. Keras: Deep learning library for theano and tensorflow. URL: https://keras.io/k.
6. Abadi M, Barham P, Chen J et al. Tensorflow: a system for large-scale machine learning. OSDI 2016, 16:265-83.
7. De Smet A, Tuite M. Use of the “Two-Slice-Touch” Rule for the MRI Diagnosis of Meniscal Tears. AJR 2006;187:911-14.
8. Pedoia V, Norman B, Mehany S et al. 3D convolutional neural networks for detection and severity staging of meniscus and PFJ cartilage morphological degenerative changes in osteoarthritis and anterior cruciate ligament subjects. JMRI October 2018, In Press, doi:10.1002/jmri.26246.