4049

Differentiation of Benign and Malignant Vertebral Fractures on Spine MRI Using ResNet Deep Learning Compared to Radiologists’ Reading
Lee-Ren Yeh1, Yang Zhang2, Jeon-Hor Chen2, An-Chi Wang3, JieYu Yang3, Peter Chang2, Daniel Chow2, and Min-Ying Su2
1Radiology, E-Da Hospital, Kaohsiung, Taiwan, 2University of California Irvine, Irvine, CA, United States, 3Radiology, Chi-Mei Medical Center, Tainan, Taiwan

Synopsis

This study compared the reading of three radiologists with different level of experience, and also investigated the potential of deep learning to differentiate between benign and malignant vertebral fractures based on T1W and T2W MRI. The results showed that deep learning using ResNet50 achieved a satisfactory diagnostic accuracy of 92%, although inferior to 98% made by a senior MSK radiologist and 96% made by a R4 resident, much higher compared to 66% made by a R1 resident. The inferior performance of ResNet50 might be partly explained by the very limited information when only considering a small bounding box.

Background: Imaging plays an important role in the evaluation of spinal diseases and is essential for therapy planning. Benign and malignant vertebral fractures may present similar features, and difficult to be differentiated [1]. For diagnosis of spinal lesions, MRI is the most helpful imaging modality. However, even after combining information from images acquired using all the sequences, accurate diagnosis of benign and malignant abnormality remains challenging in patients with ambiguous features [2]. Recently, artificial intelligence (AI) based imaging analysis has attracted significant attention due to its potential to provide a comprehensive evaluation of imaging features, which can be used to aid in diagnosis of many diseases. The purpose of this study is to apply an automatic deep learning with Residual Network-50 (ResNet50) algorithm [3], to distinguish between benign and malignant fractures on MRI. The results were compared to the diagnosis made by three radiologists with various level of training.

Methods: A total of 190 patients were included (mean age 66.5, range 23-95 years old), 140 with benign fractures (mean age 68.8) and 50 with malignant fractures (mean age 61.7). All subjects received MR imaging of the spine on a 1.5T scanner. An experienced MSK radiologist performed reading and gave a binary score to each of 15 qualitative features, and a final diagnostic impression of benign versus malignant fracture for each patient. To compare the diagnosis performed by less experienced radiologists, two residents, one in the 4th year of training, the other in the first year of training were given the dataset to perform diagnosis. For each patient, they also gave a final diagnostic impression of benign or malignant. Deep learning was performed using the most prominent abnormal vertebra in each patient as the input, marked by another experienced body radiologist. The abnormal region was first identified on sagittal T2W images. A square box containing the entire abnormal vertebra was generated and used as the input. The defined box was mapped onto T1W images using linear registration. The input of network included both T1W and T2W images of the identified slice with its two neighboring slices that also contained the lesion. Therefore, the total number of input channel was six. The ResNet50 architecture (Figure 1) was applied to differentiate between benign and malignant groups. With ResNet, since it is pre-trained with photographs with RGB colors, only three sets of images can be used in input channel. Thus, a convolutional layer with 1x1 filter was added to extract interchannel features and transform from six channels to three channels. To compensate for the small case number and the imbalance between benign and malignant cases in the dataset, the benign dataset was augmented 20 times by using random affine transformations including translation, scaling, and rotation. To balance the fewer number of malignant cases, the malignant dataset was augmented 40 times. The classification performance of ResNet50 was evaluated using 10-fold cross-validation. The prediction results based on 2D slices meant that each slice had its own diagnostic probability. For the per-patient diagnosis, the highest probability of malignancy among all slices of each patient was assigned to that patient. The malignancy probability obtained for each case was used to make the final diagnosis based on the threshold of 0.5.

Results: The senior MSK radiologist’s accuracy was 0.98. The 4th year resident also had very high accuracy of 0.96, and not surprisingly, the first year resident performed poorly with accuracy of 0.66. When individual scores of 15 features were used to build a logistic regression model, the diagnostic accuracy was 0.94. Diffuse signal changes occurred more frequently in the malignant group (88%). Intravertebral dark lines or bands were present only in benign fractures (26%). When deep learning using ResNet50 was applied, the accuracy was 0.84 for per-slice diagnosis, and 0.92 for per-patient diagnosis. There were 3 false negative and 12 false positive diagnoses. Figure 2 shows two malignant cases correctly diagnosed as true positives. Figure 3 shows two benign cases correctly diagnosed as true negatives. Figure 4 shows two malignant cases misdiagnosed as benign, and Figure 5 shows two benign cases misdiagnosed as malignant. These mis-diagnosed cases by deep learning were all correctly diagnosed by the senior MSK radiologist, and the important features are described in the figure legends.

Conclusions: This study investigated the application of deep learning for the differential diagnosis of benign and malignant vertebral fracture on MRI. These results suggest that deep learning using ResNet50 provides a feasible method to use T1-weighted and T2-weighted images on MRI to establish a diagnosis. The input used in deep learning was a square box covering a single abnormal vertebral body, without the inclusion of soft tissue, posterior elements, and skipped lesions. The per-patient diagnostic accuracy was 0.92, which was inferior to reading of radiologists who had sufficient training, but much higher than that of an inexperienced radiologist. The results suggest that the developed ResNet50 model may have a good clinical value in facilities lack of well-trained medical staff. With specific refinement in each clinical setting, this AI-based method has the potential to serve as a clinical tool to help less experienced readers and to improve workflow.

Acknowledgements

This study was supported by E-Da Hospital intramural seed grant EDAHM108003, NIH R01 CA127927.

References

1. Avellino AM, Mann FA, Grady MS, et al. The misdiagnosis of acute cervical spine injuries and fractures in infants and children: the 12-year experience of a level I pediatric and adult trauma center. Child's Nervous System. 2005;21:122-127.

2. Diacinti D, Vitali C, Gussoni G, et al. Misdiagnosis of vertebral fractures on local radiographic readings of the multicentre POINT (Prevalence of Osteoporosis in INTernal medicine) study. Bone. 2017;101:230-5.

3. Bengio Y. Learning deep architectures for AI. Foundations and trends® in Machine Learning. 2009;2:1-127.


Figures

Figure 1. Architecture of ResNet50, containing 16 residual blocks. Each residual block begins with one 1x1 convolutional layer, followed by one 3x3 convolutional layer and ends with another 1x1 convolutional layer. The output is then added to the input via a residual connection. The total input number is 6: T1W and T2W of the slice with its two neighboring slices, so one convolutional layer with 1x1 filter is added before ResNet to extract interchannel features and transform from 6 channels to 3 channels as input.

Figure 2. Two true positive malignant cases. The image at left panel shows diffuse tumor infiltration at the 7th cervical (C7) vertebral body with posterior cortical destruction and no apparent collapse. The image at right panel shows diffuse tumor infiltration at third thoracic (T3) vertebra with anterior wedge deformity. The fatty change of other cervical vertebrae in the left panel and T2/T4 vertebrae in right panel is post-radiation effect.

Figure 3. Two true negative benign cases. The left case is a chronic benign osteoporotic fracture with resolution of bone marrow edema. Although with severe collapse, the height of posterior vertebral body is still preserved. The right case is a chronic osteoporotic fracture with prior vertebroplasty. The irregular dark patch in the vertebra represents the cement material of vertebroplasty. Both cases show fractures in several other vertebrae.

Figure 4. Two false negative cases, malignant fracture misdiagnosed as benign. The image at left panel shows diffuse signal change and paravertebral soft tissue mass at L2 vertebra. The coexisted metastatic mass at L3 and S2 vertebrae are also noted. The right case shows diffuse tumor infiltration, necrotic cleft, central concave collapse, and paravertebral soft tissue mass.

Figure 5. Two false positive cases, benign fracture misdiagnosed as malignant. The left case is a recent benign fracture with typical band pattern marrow edema. The right case is a benign fracture post cement vertebroplasty.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)
4049