0247

Deep learning for the detection and differentiation of vertebral fracture
Yang Zhang1, Lee-Ren Yeh2, Jeon-Hor Chen1,2, Ning Lang3, Xiaoying Xing3, Yongye Chen3, Qizheng Wang3, Peter Chang1, Daniel Chow1, Huishu Yuan3, and Min-Ying Su1
1Department of Radiological Science, University of California, Irvine, CA, United States, 2Department of Radiology, E-Da Hospital and I-Shou University, Kaohsiung, Taiwan, 3Department of Radiology, Peking University Third Hospital, Beijing, China

Synopsis

This study investigated the value of deep learning for the detection and differential diagnosis of vertebral fracture. A model using ResNet50 was developed and tested in a separate dataset. The results were compared with the interpretation of an experienced radiologist. Our study noted that the analysis based on single vertebral body without inclusion of the soft tissue, the posterior elements, and the skipped lesions might be the reason why the radiologist’s reading was better than deep learning approach. For the identification of malignant fracture using whole images from training set, the prediction accuracy was only moderate, with rooms for improvement.

Introduction

The differentiation of benign and malignant vertebral fracture is necessary for a timely therapeutic planning. Clinical accurate diagnosis is difficult in elderly patients with no or minor trauma history. Imaging plays important roles for disease evaluation. Plain radiography has limitations. Although conventional MRI has been extensively used for accessing spinal lesions (1), the accurate differential diagnosis of benign fracture and malignant fracture remains challenging. Recently quantitative imaging methods, such as diffusion weighted imaging (DWI), have been investigated and showed their values (2, 3). Nevertheless, the method has not gained wide popularity due to technical and overlapping ADC values issues. In this study we investigated the potential value of deep learning approaches in the detection and differentiation of benign and malignant vertebral fracture.

Methods

A dataset of 190 patients, 140 benign and 50 malignant fractures, were analyzed. All subjects received MRI of the spine at a GE 1.5T scanner. Imaging sequences included sagittal T1-, axial and sagittal T2-weighted imaging. These images were reconstructed into 512x512 matrix. An experienced radiologist confirmed the lesions and gave the binary score for 17 qualitative features as well as a final diagnostic impression of benign vs. malignant for each patient. All interpretative parameters were combined to develop a classification model using logistic regression (Table 1). Another experienced radiologist identified the abnormal region on sagittal T2W images by manually placing a rectangle region of interest (ROI). The generated smallest bounding box containing the entire abnormal area was used as input, and mapped to T1W using linear registration. The input of network includes both T1W and T2W of the selected slice combined with its two neighboring slices containing the lesion. Therefore, the input channel number is 6. We used Resnet 50 as backbone and a convolutional layer with 1x1 filter was added to the bottom of the network to extract interchange features and transformed 3 channels to 6 channels (4). The classification performance in the training dataset was evaluated using 10-fold cross-validation. The developed model using ResNet50 was applied to a second independent testing dataset of 113 benign and 113 malignant patients. Of these, 78 patients had reconstructed matrix size of 512x512, and 148 had matrix size of 384x384. In the testing dataset, the malignancy probability for each slice was directly calculated based on the model. A prediction probability over 0.5 was considered as malignancy. To explore the ability of deep learning for malignancy detection, the whole images from training set was used to train another ResNet50 to identify the slices containing malignant tumors. A classification activation map (CAM) method was used for the weakly supervised cancer localization (5). The activation map generated before the fully-connected layer could be interpreted as a heatmap that presented the likelihood of tumor lesions across spatial locations. The result was compared to the ground truth and IoU (Inresection over Union) were calculated.

Results

In the training dataset, the radiologist’s diagnostic accuracy was 0.99. When using the scores of 17 features to build a logistic regression model, the accuracy was 0.94. Using ResNet50, the accuracy was 0.84 for per-slice diagnosis, and 0.92 for per-patient diagnosis. For the testing in the second independent dataset, the matrix size of the image was found to have a great influence on the performance. When using images of 512x512 matrix size, the accuracy was 0.80 for per-slice diagnosis and 0.76 for per-patient diagnosis. But, when the matrix size was changed to 384x384, the per-slice accuracy was reduced to 0.71 and the per-patient accuracy became much lower (0.66). Figure 1 shows four examples of correct prediction of malignant and benign vertebral fracture. Figure 2 shows four examples of wrong prediction of malignant and benign vertebral fracture. To adjust the difference between matrix sizes, an architecture of the resolution fitted model was developed, as shown in Figure 3. One additional convolutional layer was added to the bottom of the developed convolutional layer with 3x3 filter size for adaptive pre-processing. We used 1/3 of patients with matrix 384x384 to tune the trained model and used the other 2/3 patients as validation set. By using this strategy to account for different matrix size, the per-slice accuracy was improved from 0.71 to 0.78, and the per-patient accuracy was improved from 0.66 to 0.74. For identification of malignant fracture using whole images from training set, the prediction accuracy is 0.71. We used 0.5 as a threshold to estimate the lesion boundary and obtained an IoU of 0.46. Figures 4 shows two examples of malignant vertebral lesion localized by the CAM method.

Discussion

This study used ResNet50 as the deep learning approach to differentiate malignant and benign vertebral fracture. The results obtained from the training and the testing dataset were all inferior to the subjective reading of the experienced radiologist. The better diagnostic performance of human reading is likely due to the fact that in this study, we only analyzed one level of selected vertebral body. However, the radiologist’s diagnosis of benign or malignant fracture is based on the combined information derived from T1WI and T2WI images in all planes, including the detection of adjacent soft tissue mass, pedicle and posterior element involvement, and other skipped bone marrow lesion, which are more specific for malignancy. These imaging components were not analyzed by deep learning approach. However, the deep learning approach can potentially provide help for the young and inexperienced radiologists. For the equivocal cases difficult to make a definite diagnosis, it is also hoped that deep learning approach can increase the diagnostic specificity. Our study also noted that the imaging matrix had a huge impact of the diagnostic accuracy. Thus, the image matrix size or spatial resolution needs to be considered in designing algorithms to improve the robustness of the diagnostic model. For the detection of malignant fracture using whole images from training set, the prediction accuracy was only moderate, with rooms for improvement.

Acknowledgements

This study was supported by NIH R01 CA127927, the National Natural Science Foundation of China (81971578, 81701648), and the Key Clinical Projects of the Peking University Third Hospital (BYSY2018007).

References

[1] Jung HS, et al. Discrimination of metastatic from acute osteoporotic compression spinal fractures with MR Imaging. RadioGraphics. 2003;23:179–87.

[2] Herneth AM, et al. Vertebral metastases:assessment with apparent diffusion coefficient. Radiology. 2002;225:889–94.

[3] Chan JH, et al. Acute vertebral body compression fractures:discrimination between benign and malignant causes using apparent diffusion coefficients. J Radiol. 2002;75:207–14.

[4] He K, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016.

[5] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. and Torralba, A., 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921-2929).

Figures

Figure 1: Four examples of correct prediction of fracture cases. (A, B) true negative prediction of benign fracture cases. (C, D) true positive prediction of malignant fracture cases.

Figure 2: Four examples of wrong prediction of fracture cases. (A, B) two benign fracture cases misdiagnosed as malignant fractures. (C, D) two malignant fracture cases misdiagnosed as benign fractures

Figure 3: Architecture of the resolution fitted model

Figure 4: (A-B): A case example from a 90-year-old man with malignant fracture localized by the classification activation map (CAM) method. (A) Original image, (B) a heatmap that presented the likelihood of malignant lesions across spatial locations. (C-D): A case example from a 86-year-old woman with malignant fracture localized by the classification activation map (CAM) method. (C) Original image, (D) a heatmap that presented the likelihood of tumor lesions across spatial locations. The red areas on the heatmaps shows higher probability. Although both cases were diagnosed as malignant fracture, the red areas in (B) has higher probability of malignancy than (D).

Table 1: Qualitative Features Evaluated by Radiologist in Visual Reading

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
0247