Automatic segmentation of the knee menisci would facilitate quantitative and morphological evaluation in diseases such as osteoarthritis. We propose a deep convolutional neural network for the segmentation of 3D UTE-Cones Adiabatic T1ρ-weighted volumes of the meniscus. To show the usefulness of the proposed method, we developed the models using regions of interests provided by two radiologists. The method produced strong Dice scores and consistent results with respect to meniscus volume measurement. The inter-observer agreement between the models and the radiologists was very similar to that of the radiologists alone.
Introduction
Osteoarthritis (OA) is the most common form of arthritis in the knee1. The meniscus plays a key role in the initiation and progress of OA. However, it has a short T2 and demonstrates low signal on conventional MR sequences. The 3D ultrashort echo time Cones Adiabatic T1ρ (3D UTE-Cones-AdiabT1ρ) sequence provides high signal and quantitative measurement of AdiabT1ρ2, which may provide more accurate evaluation of meniscus degeneration. Manual segmentation of the menisci, however, is time consuming and requires experienced readers3. In this work, we proposed a fully automated segmentation algorithm of the menisci based on a U-Net deep convolutional neural network (CNN)4. The performance of the proposed approach was evaluated using manual segmentation provided by two radiologists. Additionally, the inter-observer agreement was studied.Representative 3D UTE-Cones-AdiabT1r images of a 23y old volunteer are shown in Figure 1. The meniscus is depicted with high resolution and SNR, with excellent fitting demonstrating a T1r of 21.5±1.1ms.
The average Dice score between the ROIs generated by the radiologists was equal to 0.794, indicating good inter-observer agreement. Both CNN models produced strong Dice coefficients equal to 0.808 and 0.822 for CNN1 and CNN2, respectively. Additionally, the Dice score between the ROIs calculated using both CNNs was equal to 0.851. The Dice score between the first radiologist and CNN2 (developed using the second radiologist’s ROIs) was equal to 0.798. Similarly, in the case of the second radiologist and CNN1, the Dice score was equal to 0.799. The agreement between the radiologists was the same as between the radiologist and the deep learning model developed using ROIs provided by another radiologist. Moreover, significantly higher agreement between the CNN models than between the radiologists suggest that the networks successfully generalized how to segment the menisci based on image pixel intensities. Results are summarized in Table 1. Both deep learning models were excellent at detecting menisci pixels with the area under the receiver operating characteristic curve higher than 0.94. Figure 2 shows a comparison between the manual segmentation and the automatic segmentation obtained using the CNNs. While the detection of the menisci by the CNNs was robust, achieving perfect overlapping between the ROI provided by the radiologist and that calculated by the model was difficult due to low visibility of the menisci borders. We found that, for each model, the Dice score was correlated with the size of meniscus ROI generated by manual segmentation. This indicates that the Dice coefficient may not be a good predictor of the segmentation performance if the meniscus area is small.
Figure 3 shows the consistency of menisci measurements between the radiologists and the models. The Spearman’s rank correlation coefficient between the areas determined using radiologists’ ROIs were equal to 0.799. Again, higher agreement was obtained in the case of the ROIs calculated by the CNNs, 0.883. Bland-Altman plots in Figure 3 indicate that the area estimates produced by the radiologists and the models were consistent.