Manual identification of bone and cartilage abnormalities in MR images can be laborious and time consuming. The goal of this study was to develop a fully automated deep learning pipeline to identify morphological and degenerative changes in patients with hip osteoarthritis (OA). It included an object detection deep convolutional neural network (DCNN) that generated cropped images of the hip joint and a classification DCNN that identified the presence of morphological bone and cartilage changes.
133 subjects with radiographic or symptomatic hip OA (as graded by SHOMRI) were recruited for this study (age 42.88±12.75 years, BMI 22.94±3.029 Kg/m3, 75 males, 58 females). T2-weighted, fat-saturated coronal hip images (TR = 60 ms, TR = 2.4s, slice thickness=4 mm, matrix=288x224, FOV=14-20 cm) were acquired on a 3T Discovery 750 MR scanner (GE Healthcare, Waukesha, WI).
An end-to-end automated pipeline was built to evaluate morphological degenerative changes (Fig. 1). It included an object detection deep convolutional neural network (DCNN) that generated cropped images of the hip joint and a classification DCNN that identified the presence of morphological bone and cartilage abnormalities.
The object detection network was implemented in Python and TensorFlow Object Detection API (Google, Mountain View, CA) on single-shot detector with Resnet-50 feature pyramid network (RetinaNet) architecture [3]. The model was pre-trained on ImageNet classification and COCO object detection datasets and trained with Nvidia Titan X GPU on a dataset of 70 hip MR images with bounding boxes around the femoral head (90% train / 10% validation, random flip augmentation, batch=8, 25000 iterations). The network's output was the central slice and the bounding box over the femoral head, which was used to extract a set of 5 slices, cropped around the weight bearing region of the joint (Fig. 1). These cropped images were labeled by two trained radiologists on the presence of bone marrow edema and cartilage lesions.
The classification DCNN was implemented using PyTorch framework with DenseNet-100 architecture [5,6], pretrained on grayscale CIFAR-32 dataset. The model used an SGD optimizer with initial learning rate of 0.005 reduced to 0.001 after 20 epochs, momentum=0.9, weight_decay=5e-4, and random horizontal flip augmentation. It was trained on datasets of 539 and 647 images (edema, cartilage lesions) with batch of 60, 170 epochs, 65-15-10% train-validation-test split. Saliency maps were generated using Grad-CAM algorithm [7,8].
The detection network showed excellent convergence, without overfitting at 25000 iterations (Fig. 2). The object detection showed a mean intersection over union of 0.92±0.04 on the validation dataset.
Figure 3 shows cross-entropy loss and precision/recall plots of the bone marrow edema detection network with the corresponding confusion matrix shown in figure 4. It achieved sensitivity of 0.73 and specificity of 0.92 on the validation dataset. The cartilage lesion detection network had sensitivity of 0.67 and specificity of 0.68 after 13 epochs. An example of a saliency map generated during inference with the bone marrow edema classification network is shown in figure 5.
EO and RT have contributed equally to this work.
This project was supported by NIH-NIAMS grants R01AR069006 and P50AR060752.