3518

Annotation of Benign Prostatic Hyperplasia Lesions Can Improve the Detection of Prostate Cancer
Yinqiao Yi1, Zhenwei Ding2, Guoquan Huang2, Dongmei Wu1, Yang Song3, and Guang Yang1
1Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, shanghai, China, 2Department of Medical Imaging, the Second People's Hospital of Wuhu, Wuhu, Anhui Province, China, 3Siemens Healthineers Ltd., shanghai, China

Synopsis

Keywords: Prostate, Prostate, BPH, PCa

Motivation: Accurate interpretation of prostate MRI demands a high level of expertise and deep learning models for prostate cancer (PCa) detection often suffer from low specificity.

Goal(s): To explore the value of annotation of benign prostatic hyperplasia (BPH) to prostate cancer (PCa) detection.

Approach: We retrospectively collected 96 patients with PCa and 92 patients with BPH, all scanned with PI-RADS protocol. Two deep learning models were built: Model1 only detected PCa while Model2 simultaneously detected BPH and PCa.

Results: Model2 achieved superb performance with test AUC of 0.995, outperforming Model1 whose test AUC was 0.770.

Impact: Explicitly using the BPH label improved the performance of PCa detection significantly, implying multi-task deep learning models targeting multiple diseases are not only more in line with the needs of clinical applications, but can also bring about performance improvement.

Introduction

Artificial intelligence has exhibited the potential to enhance the radiological evaluation of prostate MRI by offering fully automated detection and segmentation of potentially suspicious lesions [1-2]. However, the diagnosis still requires the expertise of the physician and suffers from low specificity [3]. It was observed that some of the false positives produced in the previous deep learning models [3] were benign prostatic hyperplasia (BPH) incorrectly identified as PCa, therefore we hypothesized that forcing the model to explicitly identify BPH lesions might help to improve the performance of PCa detection. In this study, we built deep learning models for PCa detection with/without BHP identification task and compared their performance to evaluate the value of BPH annotations to PCa detection.

Methods

We retrospectively collected 96 patients with pathologically confirmed PCa and 92 patients with pathologically confirmed BPH on one 3T scanner (Philips Achieva) in the study. Scan protocols suggested by PI-RADS v2.1 were used to get T2-weighted images (T2W), diffusion-weighted images (DWI) with b-value=2000mm2/s, and apparent diffusion coefficient (ADC) maps. ROIs were outlined by one radiologist with 2-year experience in prostate MRI and reviewed by a senior radiologist. The study cohort was split randomly into a training cohort (N=148) and an independent testing cohort (N=40). All images were resampled to an in-plane resolution of 0.5×0.5mm2. The DWI and ADC images were aligned onto T2W with Elastix. Then quantitively normalized ADC maps and z-scored T2W images were input into a fully automated deep learning pipeline [4] based on the U-Net architecture [5] and implemented using the nnUNet framework [6] (batch size of two, six downsampling blocks). Two models were built in this study. Model1 outputs a three-dimensional softmax map, assigning a value between 0 and 1 for each voxel in the image, indicating the predicted probability of the poxel being PCa lesion. To obtain predictions at the patient level, the highest value in each patient was used. For lesion segmentation overlap evaluation, the three-dimensional softmax map was discretized with a threshold of 0.5. Subsequent postprocessing steps included connected component analysis and the removal of speckles. Model2 was similar to Model1, but with an extra output channel for BPH probability map. Patient-level receiver operating characteristic (ROC) curve was used to evaluate the performance of two models. Difference in model performance was assessed using the DeLong’s test [7], used a significance level of p < 0.05. The specificity, sensitivity, and accuracy for case-level assessment were calculated for both models. Statistical analyses were performed with Python version 3.8.16.

Results

The detailed results were shown in Table 1. The Model2 achieved AUC values of 1.000 and 0.995 on the training and testing cohorts, respectively, outperforming the Model1 whose training and testing AUCs are 0.815 and 0.770, respectively. DeLong’s test showed significant difference between the two models with p<0.001 in training dataset, and p=0.002 in testing dataset. The ROC curves are shown in Figure 1. The results of manual and automatic segmentation are shown in Figure 2.

Discussion

The multi-task model, namely Model2, which simultaneously detect PCa and BPH achieved a significantly better performance in PCa detection. This is understandable since multi-task provided the model training with more constraints, which is helpful to lower the risk of overfitting. Besides, by forcing the model to identify BPH lesions, the chances of PCa false positive were also lowered, since BPH lesions were less likely to be identified as PCa. Simultaneously identify lesions of different diseases can make the model more in line with the needs of clinical applications, where patients of unknown diseases are subjected to diagnosis. Furthermore, multi-tasking can also help to improve the performance of each task, in this case, the identification of PCa. This suggests more attention should be paid to multi-task diagnosis models, especially of those closely related diseases. Since multi-task network requires extra manual annotations, algorithms for weakly supervised or semi-supervise learning may also be exploited. Limitation of our study include the limited dataset size and lacking of external test cohort. The idea and the model of this work should be validated with larger dataset from more diverse sources.

Conclusion

In conclusion, explicitly using BPH annotations via a multi-task deep learning model can significantly improve the performance of PCa detection on MRI images.

Acknowledgements

No acknowledgement found.

References

[1] Schelb P, Kohl S, Radtke JP et al (2019) Classifcation of cancer at prostate MRI: deep learning versus clinical PI-RADS assessment. Radiology, 293:607–617.

[2] Zhong X, Cao R, Shakeri S et al (2019) Deep transfer learning based prostate cancer classifcation using 3 Tesla multi-parametric MRI. Abdom Radiol (NY), 44:2030–2039.

[3] Ke-Wen Jiang, Yang Song, Ying Hou et al, Performance of artificial intelligence-aided diagnosis system for clinically significant prostate cancer with MRI: a diagnostic comparison study. Journal of Magnetic Resonance Imaging, 57(5), 1352-1364.

[4] Netzer N, Weisser C, Schelb P et al (2021) Fully automatic deep learning in bi-institutional prostate Magnetic Resonance Imaging: Effects of Cohort Size and Heterogeneity. Invest Radiol, 56(12):799–808.

[5] Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation.

[6] Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH (2021) nnU-Net: a self-confguring method for deep learning-based biomedical image segmentation. Nat Methods, 18:203–211.

[7] Delong ER, Delong DM, Clarkepearson DI (1988) Comparing the areas under 2 or more correlated receiver operating characteristic curves - a nonparametric approach. Biometrics, 44:837–845.

Figures

Table 1. Performance of models

Figure 1. ROC curves showing the performance of PCa detection of Model1 (a) and Model2 (b).

Figure 2. Results of manual label and automatic segmentation of the PCa and BPH.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)
3518
DOI: https://doi.org/10.58530/2024/3518