1138

MRI-Based Multi-Task Deep Learning for Cartilage Lesion Severity Staging in Knee Osteoarthritis

Bruno Astuto¹, Io Flament¹, James Mitrani², Rutwik Shah^1,3, Matthew Bucknor¹, Thomas Link¹, Valentina Pedoia^1,3, and Sharmila Majumdar^1,3

¹Department of Radiology and Biomedical Imaging, University of California San Francisco - UCSF, San Francisco, CA, United States, ²Lawrence Livermore National Lab, San Francisco, CA, United States, ³Center for Digital Health Innovation, UCSF, San Francisco, CA, United States

Synopsis

The automation of the grading task for the knee MRI scoring is appealing. The goal of this study is to leverage recent developments in Deep Learning (DL) applied to medical imaging in order to (i) identify cartilage lesions and assess severity (ii) identify the presence of BMELs, (ii)combine the two models in a multi-task automated and scalable fashion. We were able to boost performance of our final classifiers by not simply focusing on what the fine tuning of a single purpose model could offer, but rather broadly considering related tasks that could bring additional information to our classification problem.

Introduction

Semi quantitative scoring systems, such as the Whole-Organ Magnetic Resonance Imaging Score (WORMS)¹ have been developed in an attempt to standardize the Knee MRI reading. Despite grading systems being widely used in research setting the clinical application is hampered by the time and level of expertise needed to reliably perform the reading making the automation of this task appealing for a smoother and faster clinical translation. The goal of this study is to fill this void by capitalizing on recent developments in Deep Learning-DL applied to medical imaging. Specifically, we aim to (i)identify cartilage lesions and assess severity (ii)identify the presence of BMELs, (ii)combine the two models in a multi-task automated and scalable fashion.

Methods

1,435 knee MRI from subjects with and without OA and after ACL injury were collected from three previous studies (age=42.79±14.75years, BMI=24.28±3.22Kg/m2, 48/52 male/female split). All studies used a high-resolution 3D fast spin-echo(FSE) CUBE sequence TR/TE=1500/26.69ms, field-of-view=14cm, matrix=512-by-512, slice-thickness=0.5mm, bandwidth=50.0kHz). A 3D V-Net² neural network(NN) architecture was used to learn segmentations of the 6 cartilage compartments using 480 manually segmented volumes as training/test data3. In order to optimize the segmentation task, we utilized two V-net architectures. The first performed segmentations for 5 classes (Figure 1A), namely femur, tibia and patella cartilage, one class for meniscus and one for background (BG). The second V-net (Figure 1B), solves the problem of assigning 11 labels to the compartments segmented by the first V-net. The 11 classes are: patella, trochlea, medial and lateral tibia, medial and lateral femur cartilage, 4 menisci and BG). After applying the segmentation to the entire dataset, bounding boxes around the 6 cartilage compartments were extracted, resulting in 8,610 cartilage volumes of interest(cVOIs) (Figure 1C). cVOIs were randomly divided with a 65/20/15% split into training, validation, and testing/holdout datasets, keeping the distributions of lesion severity per compartment. 3 classes labeled were generated as follows: (1)No Lesion-NL(WORMS 0 and 1), (2)Partial Thickness Lesion-PT(WORMS 2, 3 and 4) and (3) Full Thickness Lesion–FT(WORMS 2.5, 5 and 6). Randomly generated 3-axis rotational(±25degrees) and zooming (±a factor of 20%) image augmentations were performed (Figure 1C). The distribution of lesions at each compartment and how its unbalance was addressed can be observed in Figure 2. The Lesion classification problem was divided in 3 steps: (I) automatic cartilage lesion severity 3-class classification ( Figure 3A shows the proposed 3D DL architecture), (II) automatic Bone Marrow Edema Lesion(BMEL) 2-class classification (Figure 3B shows the 2D DL architecture used) and (III) The final optimal combination of the outputs of both DL networks were combined with demographics data and fed as input to a XGBoost⁴ classifier, where a final lesion severity staging solution was output and applied to a holdout set.

Results

The first step on the cartilage lesion classification was to automatically classify lesions severity only with 3D volumetric image data. Overall accuracy for that classifier was 79% on the holdout set. Based on three-channel MIPS, the accuracy of BMEL classifier was >80%. For the shallow classifier ensemble three class WORMS model, an overall accuracy of 82% was achieved when combining the 3D-CNN with demographics data. The count confusion matrix can be viewed in Figure 4, along with results for the combinations of the 3 classifiers used in our pipeline. A 4th option is also considered when using the real radiologists graded BMEL labels as input for the shallow classifier, where it boosted the performance to a 86% overall accuracy. This shows the potential of improving the BMEL classifier and combining it with cartilage lesion classification in a multi-task learning approach. In an attempt to interpreted better our results misclassified cases were further inspected by experts Figure 5 shows an example.

Discussion and Conclusion

By combining different anatomical structures (distinct cartilage compartments) and lesion classification grading for both cartilage and BMEL, we are moving towards multitask machine learning for lesion detection. The proposed approach is weakly supervised in the sense that it learns features using only image level labels (i.e., all that is known is the presence or absence of a lesion somewhere in the 3D volume). With the proposed approach, we were able to boost performance of our final classifiers by not simply focusing on what the fine tuning of a single purpose model could offer, but rather broadly considering related tasks that could bring additional information to our classification problem.

Acknowledgements

This work was funded by GE Healthcare and the National Institutes of Health - National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIH-NIAMS) Grant number: R61AR073552 (SM/VP)

References

1. Peterfy CG, Guermazi A, Zaim S, et al. Whole-Organ Magnetic Resonance Imaging Score (WORMS) of the knee in osteoarthritis. Osteoarthritis Cartilage. 2004;12(3):177-90.

2. Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. arXiv: 1606.04797. 2016

3. Norman B, Pedoia V, Majumdar S. Deep Learning Convolutional Neural Networks for Knee Multi-Tissue Automatic Morphometry and Relaxometry. Radiology. 2018 Jul;288(1):177-185

4. Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, arXiv:1603.02754

Figures

Figure1:Fully Automated Multi-Task DL Pipeline: (A)5-class cartilage compartment segmentation V-net. (B)The original image and its 5-class segmentations are used as input to another V-net, responsible for labeling the segmentations according to 11-classes. (C)Pre-processing pipeline including data splitting, bounding boxing and augmentation. (D)Volumes and the respective gradings are used respectively as input and labels in order to train: (E)3D-CNN DL classifier to assess the presence and/or severity of a cartilage lesion, (F)2D Dense+CNN classifier is trained to detect presence of BMEL. (G)XGBoost classifier, outputting (H)Lesion severity assessment (0:No Lesion, 1:Partial Thickness and 2:Full Thickness).

Figure2:Dataset Distribution Summary: This figure shows the distribution of the lesion classes per cartilage compartment. Patella is the compartment where we find the most balanced dataset throughout the lesion severity classes. Nonetheless, during preprocessing (Figure 1C) augmentation and up sampling were used to mitigate the unbalancing issue. Moreover, class weights computed after augmentation were applied to the loss functions during training of the algorithms to further address the unbalancing issue.

Figure3:Multi-Task DL architectures for cartilage lesion classification: All the mVOIs were resized to the average cropped cartilage region of 38-by-83-by-64 voxels. Cartilage volumes were then fed into a (A) 3D CNN containing 3 convolutions (2 of which were stacked, 2 max pooling layers, and two densely connected layers. (B) The BMEL 2D CNN classifier uses a pre-trained weights of Densenet-121, flattens its output and processes it through two densely connected layers.

Figure4:Accuracy assessment: (A)Confusion matrix(CF) showing the accuracy of the 3D-CNN predictions.(B)CF showing the accuracy of the 3D-CNN class probabilities outputs, together with patients demographics data, after passing through the XGBoost classifier. This automated pipeline configuration gave the best overall accuracy of 82%.(C)CF for the 3D-CNN probabilities outputs, together with patients demographics data and the 2-class probability outputs from 2D-CNN. (D)CF of 3D-CNN outputs+Demographics+true BMEL labels. Showing the potential of improving BMEL predictions and using them in multi-task DL cartilage lesion detection. Of the partial thickness lesion class cases that were misclassified, only 9% of these are mistaken as more severe partial thickness lesion (WORMS 4).

Figure5:Discordance between radiologist and deep learning grading for patellar cartilage: Confidence levels analysis showed uncertainty between normal 65.11% and Full Thickness Class 25.49%. Radiologist’s visual inspection confirmed mixed findings across different slices which may be the source of confusion for the deep learning model. (A)Sagittal view: Superior fourth of the patella has complete loss of cartilage (>1cm) while the remaining inferior patellar cartilage has intact cartilage lining. (B)Axial view superior section: High grade patellar cartilage defect seen with subjacent bone marrow edema. (C)Axial view middle section: Patellar cartilage intact with signal abnormality.(White-arrow="intact cartilage", blue-arrow="high grade cartilage defect").

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

1138