Lee-Ren Yeh1, Yang Zhang2, Jeon-Hor Chen2, An-Chi Wang3, JieYu Yang3, Peter Chang2, Daniel Chow2, and Min-Ying Su2
1Radiology, E-Da Hospital, Kaohsiung, Taiwan, 2University of California Irvine, Irvine, CA, United States, 3Radiology, Chi-Mei Medical Center, Tainan, Taiwan
Synopsis
This study compared
the reading of three radiologists with different level of experience, and also investigated
the potential of deep learning to differentiate between benign and malignant
vertebral fractures based on T1W and T2W MRI. The results showed that deep
learning using ResNet50 achieved a satisfactory diagnostic accuracy of 92%,
although inferior to 98% made by a senior MSK radiologist and 96% made by a R4
resident, much higher compared to 66% made by a R1 resident. The inferior
performance of ResNet50 might be partly explained by the very limited
information when only considering a small bounding box.
Background: Imaging plays an important role in the evaluation of
spinal diseases and is essential for therapy planning. Benign and malignant
vertebral fractures may present similar features, and difficult to be
differentiated [1]. For diagnosis of spinal lesions, MRI is the most helpful
imaging modality. However, even after combining information from images
acquired using all the sequences, accurate diagnosis of benign and malignant abnormality
remains challenging in patients with ambiguous features [2]. Recently,
artificial intelligence (AI) based imaging analysis has attracted significant attention
due to its potential to provide a comprehensive evaluation of imaging features,
which can be used to aid in diagnosis of many diseases. The purpose of this
study is to apply an automatic deep learning with Residual Network-50
(ResNet50) algorithm [3], to distinguish between benign and malignant fractures
on MRI. The results were compared to the diagnosis made by three radiologists
with various level of training.Methods: A total of 190 patients were included (mean age 66.5,
range 23-95 years old), 140 with benign fractures (mean age 68.8) and 50 with
malignant fractures (mean age 61.7). All subjects received MR imaging of the
spine on a 1.5T scanner. An experienced MSK radiologist performed reading and
gave a binary score to each of 15 qualitative features, and a final diagnostic impression of benign versus malignant fracture
for each patient. To compare the diagnosis performed
by less experienced radiologists, two residents, one in the 4th year
of training, the other in the first year of training were given the dataset to
perform diagnosis. For each patient, they also gave a final diagnostic
impression of benign or malignant. Deep learning was performed using the
most prominent abnormal vertebra in each patient as the input, marked by
another experienced body radiologist. The abnormal region was first identified
on sagittal T2W images. A square box containing the entire abnormal vertebra
was generated and used as the input. The defined box was mapped onto T1W images
using linear registration. The input of network included both T1W and T2W
images of the identified slice with its two neighboring slices that also
contained the lesion. Therefore, the total number of input channel was six. The
ResNet50 architecture (Figure 1) was
applied to differentiate between benign and malignant groups. With ResNet,
since it is pre-trained with photographs with RGB colors, only three sets of
images can be used in input channel. Thus, a convolutional layer with 1x1
filter was added to extract interchannel features and transform from six
channels to three channels. To compensate for the small case number and the imbalance
between benign and malignant cases in the dataset, the benign dataset was
augmented 20 times by using random affine transformations including
translation, scaling, and rotation. To balance the fewer number of malignant
cases, the malignant dataset was augmented 40 times. The classification
performance of ResNet50 was evaluated using 10-fold cross-validation. The
prediction results based on 2D slices meant that each slice had its own
diagnostic probability. For the per-patient diagnosis, the highest probability of
malignancy among all slices of each patient was assigned to that patient. The
malignancy probability obtained for each case was used to make the final
diagnosis based on the threshold of 0.5.
Results: The senior MSK radiologist’s accuracy was 0.98. The 4th
year resident also had very high accuracy of 0.96, and not surprisingly, the
first year resident performed poorly with accuracy of 0.66. When individual scores of 15 features were used to build a
logistic regression model, the diagnostic accuracy was 0.94. Diffuse signal changes occurred more frequently in the
malignant group (88%). Intravertebral dark lines or bands were present only in
benign fractures (26%). When deep learning
using ResNet50 was applied, the accuracy was 0.84 for per-slice diagnosis, and
0.92 for per-patient diagnosis. There were 3 false negative and 12 false
positive diagnoses. Figure 2 shows two malignant cases correctly
diagnosed as true positives. Figure 3 shows two benign cases correctly
diagnosed as true negatives. Figure 4 shows two malignant cases misdiagnosed
as benign, and Figure 5 shows two
benign cases misdiagnosed as malignant. These mis-diagnosed cases by deep
learning were all correctly diagnosed by the senior MSK radiologist, and the
important features are described in the figure legends.
Conclusions: This study investigated the application of deep learning for the differential
diagnosis of benign and malignant vertebral fracture on MRI. These results
suggest that deep learning using ResNet50 provides a feasible method to use
T1-weighted and T2-weighted images on MRI to establish a diagnosis. The input
used in deep learning was a square box covering a single abnormal vertebral
body, without the inclusion of soft tissue, posterior elements, and skipped
lesions. The per-patient diagnostic accuracy was 0.92, which was inferior to reading
of radiologists who had sufficient training, but much higher than that of an inexperienced
radiologist. The results suggest that the developed ResNet50 model may have a
good clinical value in facilities lack of well-trained medical staff. With
specific refinement in each clinical setting, this AI-based method has the
potential to serve as a clinical tool to help less experienced readers and to
improve workflow.Acknowledgements
This study was supported by E-Da Hospital intramural seed
grant EDAHM108003, NIH R01 CA127927.
References
1. Avellino AM, Mann FA, Grady MS, et al. The
misdiagnosis of acute cervical spine injuries and fractures in infants and
children: the 12-year experience of a level I pediatric and adult trauma
center. Child's Nervous System. 2005;21:122-127.
2. Diacinti D, Vitali C, Gussoni G, et al.
Misdiagnosis of vertebral fractures on local radiographic readings of the
multicentre POINT (Prevalence of Osteoporosis in INTernal medicine) study.
Bone. 2017;101:230-5.
3. Bengio Y. Learning deep architectures for AI.
Foundations and trends® in Machine Learning. 2009;2:1-127.