Yinqiao Yi1, Zhenwei Ding2, Guoquan Huang2, Dongmei Wu1, Yang Song3, and Guang Yang1
1Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, shanghai, China, 2Department of Medical Imaging, the Second People's Hospital of Wuhu, Wuhu, Anhui Province, China, 3Siemens Healthineers Ltd., shanghai, China
Synopsis
Keywords: Prostate, Prostate, BPH, PCa
Motivation: Accurate interpretation of prostate MRI demands a high level of expertise and deep learning models for prostate cancer (PCa) detection often suffer from low specificity.
Goal(s): To explore the value of annotation of benign prostatic hyperplasia (BPH) to prostate cancer (PCa) detection.
Approach: We retrospectively collected 96 patients with PCa and 92 patients with BPH, all scanned with PI-RADS protocol. Two deep learning models were built: Model1 only detected PCa while Model2 simultaneously detected BPH and PCa.
Results: Model2 achieved superb performance with test AUC of 0.995, outperforming Model1 whose test AUC was 0.770.
Impact: Explicitly using the BPH label improved the performance of PCa
detection significantly, implying multi-task deep learning models targeting
multiple diseases are not only more in line with the needs of clinical
applications, but can also bring about performance improvement.
Introduction
Artificial
intelligence has exhibited the potential to enhance the radiological evaluation
of prostate MRI by offering fully automated detection and segmentation of
potentially suspicious lesions [1-2]. However, the diagnosis still requires the
expertise of the physician and suffers from low specificity [3]. It was
observed that some of the false positives produced in the previous deep
learning models [3] were benign prostatic hyperplasia (BPH) incorrectly identified
as PCa, therefore we hypothesized that forcing the model to explicitly identify
BPH lesions might help to improve the performance of PCa detection. In this
study, we built deep learning models for PCa detection with/without BHP identification
task and compared their performance to evaluate the value of BPH annotations to
PCa detection.Methods
We
retrospectively collected 96 patients with pathologically confirmed PCa and 92
patients with pathologically confirmed BPH on one 3T scanner (Philips Achieva)
in the study. Scan protocols suggested by PI-RADS v2.1 were used to get
T2-weighted images (T2W), diffusion-weighted images (DWI) with b-value=2000mm2/s,
and apparent diffusion coefficient (ADC) maps. ROIs were outlined by one
radiologist with 2-year experience in prostate MRI and reviewed by a senior
radiologist. The study cohort was split randomly into a training cohort (N=148)
and an independent testing cohort (N=40).
All
images were resampled to an in-plane resolution of 0.5×0.5mm2. The
DWI and ADC images were aligned onto T2W with Elastix. Then quantitively
normalized ADC maps and z-scored T2W images were input into a fully automated
deep learning pipeline [4] based on the U-Net architecture [5] and implemented
using the nnUNet framework [6] (batch size of two, six downsampling blocks). Two
models were built in this study. Model1 outputs a three-dimensional softmax map,
assigning a value between 0 and 1 for each voxel in the image, indicating the
predicted probability of the poxel being PCa lesion. To obtain predictions at
the patient level, the highest value in each patient was used. For lesion
segmentation overlap evaluation, the three-dimensional softmax map was discretized
with a threshold of 0.5. Subsequent postprocessing steps included connected
component analysis and the removal of speckles. Model2 was similar to Model1,
but with an extra output channel for BPH probability map.
Patient-level
receiver operating characteristic (ROC) curve was used to evaluate the
performance of two models. Difference in model performance was assessed using
the DeLong’s test [7], used a significance level of p < 0.05. The
specificity, sensitivity, and accuracy for case-level assessment were calculated
for both models. Statistical analyses were performed with Python version 3.8.16.Results
The
detailed results were shown in Table 1. The Model2 achieved AUC values of 1.000
and 0.995 on the training and testing cohorts, respectively, outperforming the
Model1 whose training and testing AUCs are 0.815 and 0.770, respectively. DeLong’s
test showed significant difference between the two models with p<0.001 in
training dataset, and p=0.002 in testing dataset. The ROC curves are shown in
Figure 1. The results of manual and automatic segmentation are shown in Figure
2.Discussion
The
multi-task model, namely Model2, which simultaneously detect PCa and BPH
achieved a significantly better performance in PCa detection. This is
understandable since multi-task provided the model training with more
constraints, which is helpful to lower the risk of overfitting. Besides, by
forcing the model to identify BPH lesions, the chances of PCa false positive
were also lowered, since BPH lesions were less likely to be identified as PCa. Simultaneously
identify lesions of different diseases can make the model more in line with the
needs of clinical applications, where patients of unknown diseases are
subjected to diagnosis. Furthermore, multi-tasking can also help to improve the
performance of each task, in this case, the identification of PCa. This suggests
more attention should be paid to multi-task diagnosis models, especially of
those closely related diseases. Since multi-task network requires extra manual
annotations, algorithms for weakly supervised or semi-supervise learning may
also be exploited.
Limitation of our study include the limited
dataset size and lacking of external test cohort. The idea and the model of
this work should be validated with larger dataset from more diverse sources.Conclusion
In
conclusion, explicitly using BPH annotations via a multi-task deep learning
model can significantly improve the performance of PCa detection on MRI images.Acknowledgements
No acknowledgement found.References
[1]
Schelb P, Kohl S, Radtke JP et al (2019) Classifcation of cancer at prostate
MRI: deep learning versus clinical PI-RADS assessment. Radiology, 293:607–617.
[2]
Zhong X, Cao R, Shakeri S et al (2019) Deep transfer learning based
prostate cancer classifcation using 3 Tesla multi-parametric MRI. Abdom Radiol
(NY), 44:2030–2039.
[3] Ke-Wen
Jiang, Yang Song, Ying Hou et al, Performance of artificial intelligence-aided
diagnosis system for clinically significant prostate cancer with MRI: a
diagnostic comparison study. Journal of Magnetic Resonance Imaging, 57(5),
1352-1364.
[4]
Netzer N, Weisser C, Schelb P et al (2021) Fully automatic deep learning
in bi-institutional prostate Magnetic Resonance Imaging: Effects of Cohort Size
and Heterogeneity. Invest Radiol, 56(12):799–808.
[5]
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for
Biomedical Image Segmentation.
[6]
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH (2021) nnU-Net: a
self-confguring method for deep learning-based biomedical image segmentation.
Nat Methods, 18:203–211.
[7] Delong
ER, Delong DM, Clarkepearson DI (1988) Comparing the areas under 2 or more
correlated receiver operating characteristic
curves - a nonparametric approach. Biometrics, 44:837–845.