3200

Automated differentiation between benign and malignant vertebral compression fracture using a deep convolutional neural network on MRI

Takafumi Yoda¹, Satoshi Maki², Koji Matsumoto¹, Hajime Yokota³, Yoshitada Masuda¹, and Takashi Uno³
¹Department of Radiology, Chiba University Hospital, Chiba, Japan, ²Department of Orthopedic Surgery, Graduate School of Medicine, Chiba University, Chiba, Japan, ³Diagnostic Radiology and Radiation Oncology, Graduate School of Medicine, Chiba University, Chiba, Japan

Synopsis

The Differentiating between osteoporotic vertebral fractures (OVFs) and malignant vertebral compression fractures (MVFs) due to spinal metastasis is a challenging problem for the spine surgeons and the radiologists. We evaluated the performance of our CNN model in differentiating between OVFs and MVFs on short-TI inversion recovery (STIR) and T1-weighted (T1WI) images compared with the performance of three spine surgeons. The sensitivity, specificity, and accuracy of the CNN on both STIR and T1WI were equal to or better than those of the three spine surgeons.

Introduction

The Differentiating between osteoporotic vertebral fractures (OVFs) and malignant vertebral compression fractures (MVFs) due to spinal metastasis is a challenging problem for the spine surgeons and the radiologists. This problem is especially difficult in elderly patients who are at high risk of both OVFs and MVFs. However, an accurate differentiating OVFs from MVFs is crucial for appropriate clinical staging and treatment planning. The discrimination between OVFs and MVFs is mainly done by imaging including X-ray, computed tomography and magnetic resonance imaging (MRI). Although MRI has been successfully used nowadays, an accurate diagnosis using MRI sometimes remains difficult.¹ A deep-learning approach based on a convolutional neural network (CNN) is gaining popularity across a variety of fields including medical imaging. However, few studies using CNN have been reported in the spine region. The purpose of this study was to evaluate the performance of our CNN model in differentiating between OVFs and MVFs on short-TI inversion recovery (STIR) and T1-weighted (T1WI) images compared with the performance of three spine surgeons.

Materials and Methods

We retrospectively observed the medical records of patients who underwent thoracolumbar MRI scans. The final diagnosis was made based on biopsy results, histological findings in surgery, and clinical and radiologic follow-up for at least 6 months. Sagittal STIR and sagittal T1WI were used for the CNN training and validation. From 50 patients with OVFs, 59 MRI examinations with 368 slices for STIR and T1WI were obtained. From 47 patients with MVFs, 53 MRI examinations with 329 slices for STIR and T1WI were obtained and included in the present study. MRI examinations were performed using either 1.5 T or 3.0 T MR systems and the acquisition protocol was not universal. The STIR was obtained with the following parameters :(TR = 3,800-5,300 ms; TE = 40-70 ms; FOV = 260-340 mm; slice thickness = 3.0-4.0 mm; slice gap = 0.5-1.0 mm; Inversion Time = 160-200 ms). The T1WI was obtained with the following parameters :(TR = 400-600 ms; TE = 10-25 ms; FOV = 260-340 mm; slice thickness = 3.0-4.0 mm; slice gap = 0.5-1.0 mm). The deep learning framework, Tensorflow, was used to construct the CNN architecture. In the present study, we fine-tuned the Xception architectural model, which had already been trained using images with ImageNet (Fig. 1).² The probabilities for OVFs and MVFs were determined for each slice. Then, the probabilities for MVFs were averaged over all slices. Fig. 2 summarizes the decision-making process used to discriminate MRI of patients with either OVFs and MVFs. The performance of our CNN was evaluated with five-fold cross-validation. To evaluate the performance of the CNN, we plotted the receiver operating characteristic (ROC) curve and calculated the area under the curve (AUC). We calculated and compared the sensitivity, specificity, and accuracy of the diagnosis by the CNN and three spine surgeons.

Results

The patient characteristics are shown in Table 1. The AUC of ROC curves of the CNN based on STIR and T1WI were 0.967 (95% confidence interval (CI), 0.908–0.989) and 0.984 (95% CI, 0.910–0.997) respectively. The ROC curve of prediction probability compared with individual spine surgeon’s performance is shown in Fig. 3. At the optimal cutoff point, the sensitivity, specificity, and accuracy of the CNN and the three spine surgeons based on STIR are shown in Table 2(A), and based on T1WI are shown in Table 2(B). The cut-off value of the CNN was 0.377 for STIR and 0.312 for T1WI. The CNN model based on STIR showed a performance of 93.8% accuracy, 92.5% sensitivity, 94.9% specificity. On the other hand, the CNN model based on T1WI showed a performance of 96.4% accuracy, 98.1% sensitivity, 94.9% specificity. The accuracy and specificity of the CNN on both STIR and T1WI was equal to or better than that of the three spine surgeons. There were no significant differences in sensitivity based on both STIR and T1WI between CNN and spine surgeons.

Discussion

The ability of CNNs to distinguish OVFs and MVFs using MRI in this study was comparable or superior to that of the spine surgeons. High accuracy was achieved by the both CNN models trained with STIR and T1WI. To our knowledge, this is the first study distinguishing between OVFs and MVFs using deep learning on MR images. Many previous reports have described the characteristics to distinguish between OVF and MVFs. However, there are exceptional cases in clinical practice, and it can be difficult to distinguish between OVF and MVF using these MRI characteristics.³ Therefore, the distinction between OVF and MVF in MR imaging is not always reliable. An earlier study attempted to create a scoring system to distinguish MVFs from OVFs using discriminant analysis.⁴ The interpretations of the radiographic features tend to be subjective, which depend on the expertise of the reader especially for the difficult case for classification. On the other hand, CNN’s have an advantage over the scoring systems because they extract features and classify images automatically, and do not require subjective judgement.

Conclusion

We have successfully differentiated OVFs and MVFs using a CNN with high diagnostic accuracy comparable to that of spine surgeons.

Acknowledgements

No acknowledgement found.

References

Jung HS, Jee WH, McCauley TR, et al. Discrimination of Metastatic from Acute Osteoporotic Compression Spinal Fractures with MR Imaging. Radiographics 2003;23:179–87.
Chollet F. Xception: Deep learning with depthwise separable convolutions. Proc – 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017;2017-Janua:1800–7.
Mauch JT, Carr CM, Cloft H, et al. Review of the imaging features of benign osteoporotic and malignant vertebral compression fractures. Am J Neuroradiol 2018;39:1584–92.
Li Z, Guan M, Sun D, et al. A novel MRI- and CT-based scoring system to differentiate malignant from osteoporotic vertebral fractures in Chinese patients 11 Medical and Health Sciences 1103 Clinical Sciences. BMC Musculoskelet Disord 2018;19:1–7.

Figures

Figure 1. The deep learning framework Tensorflow was used to construct the CNN architecture. In the present work, we used the Xception architectural model, which had been already trained using images with ImageNet.

Figure 2. The probabilities of OVFs and malignant VCFs were measured in each slice. The final decision was made based on the average of the probability of all slices for these, and the optimal cutoff point of the probability score for MVFs.

Table 1. Baseline characteristics of the patients. Significantly older in OVFs than MVFs. Since OVFs rarely occur in the cervical spine, cervical spinal lesions are omitted from the data. There were no significant differences in the gender proportions of patients between the two groups.

Figure 3. Receiver operating characteristic curves based on short-TI inversion recovery sagittal magnetic resonance (MR) images (A) and T1-weighted sagittal MR images (B) for the convolutional neural network model. Three plots indicate the performance of the three spine surgeons. The AUC of ROC curves of the CNN based on STIR and T1WI were 0.967 (95% confidence interval (CI), 0.908–0.989) and 0.984 (95% CI, 0.910–0.997) respectively.

Table 2. Comparison of accuracy, sensitivity, and specificity between the CNN model and the three spine surgeons based on STIR sagittal MR images (A) and T1-weighted sagittal MR images (B). Data in parentheses are the 95% confidence interval. The accuracy and specificity of the CNN on both STIR and T1WI was equal to or better than that of the three spine surgeons. There were no significant differences in sensitivity based on both STIR and T1WI between CNN and spine surgeons.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

3200