Rong Wei1, Yu Xia1, Yi Zhu2, Jinyu Yang1, Ge Gao3, Xiaoying Wang3, Jue Zhang1, and Jianxiu Lian2
1Peking University, Beijing, China, 2Philips Healthcare, Beijing, Beijing, China, 3Peking University First Hospital, Beijing, China
Synopsis
Keywords: Prostate, Prostate
Motivation: The need to improve prostate cancer diagnosis through advanced understanding of lesion characteristics and reducing false positives led to this research.
Goal(s): To create a pioneering integrated system using deep learning, capable of accurately assessing the benignity or malignancy of prostate MRI images, whilst reducing labeling costs and enhancing the reliability of classifications.
Approach: The approach involves training a convolutional network with multi-parametric MRI images, incorporating credibility analysis to provide visually interpretable prostate cancer prediction results and reject low-credibility predictions.
Results: The results showed improved reliability and efficacy, with the model discarding low-credibility predictions, thus mitigating potential risks associated with prediction failures.
Impact: This study equips clinical practitioners with the ability to comprehend the decision-making process of the CAD system and manage the output results through an intuitive display. This results enhance diagnostic accuracy, potentially impacting clinicians' decision-making and patient outcomes.
Introduction
Prostate cancer (PCa) is the second most common malignant tumor in men worldwide, with an estimated 1.4 million new cases diagnosed in 2020 [1]. Deep learning models have made significant progress in the field of prostate Magnetic Resonance (MR) computer-aided diagnosis (CAD) systems [2] [3]. However, such image-level classification makes it challenging to understand the general characteristics of the lesion, leading to several false positives. More crucially, the cancer target area must be precisely described in the segmentation labels for these bottom-up detection algorithms, which substantially drives up the labeling cost.
In this study, we introduce a pioneering system capable of evaluating the benignity or malignancy of PCa multiparametric MRI (mp-MRI) images as a single integrated unit, effectively reducing labeling costs. More importantly, our model provides a visually interpretable basis and a credibility analysis of the results, significantly boosting the reliability of the classification.Methods
A cohort of 163 patients (from the years 2013-2016) both with and without PCa were selected for the study. Images were obtained using T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) and apparent diffusion coefficient (ADC) imaging on a 3.0 Tesla magnetic resonance scanner (Ingenia; Philips Healthcare), following standardized protocols.
These images were then subjected to preprocessing steps, including segmentation and normalization. Subsequently, the preprocessed DWI, T2WI, and ADC images were amalgamated into three-channel image groups. These groups were used to train a VGG-16 network [4] as shown in Figure 1, with the aid of transfer learning. Gradient-weighted Class Activation Mapping (Grad-CAM) [5] and Monte Carlo (MC) Dropout [6] techniques were employed to interpret the output of our classification network and to assess the credibility of the model. Outputs with low credibility, as defined by a predetermined credibility index threshold, were rejected.
In the most extreme binary classification model scenario, half the network in the model predicts a result of 0, while the other half predicts 1, resulting in a variance of 0.25. Given this, we can express the credibility index C as: $$$C=1-\left(\frac{\mathrm{D}}{0.25}\right)$$$, where D signifies the model's variance. The mean value then computed pixel by pixel for the activation map produced by the model, creating the final credibility estimation map. To enhance the model's reliability, prediction results that fall below the established credibility threshold are discarded. This approach ensures a more accurate and reliable model.
To evaluate the model's performance, receiver operating characteristic (ROC) curves were plotted and metrics such as area under the curve (AUC), false positive rate (FPR), and negative predictive value (NPV) were computed.Results
A representative set of high-credibility
prediction results is shown in Figure 2, while Figure 3 displays a set of
visualization results with low credibility. Figure 2 shows that the activation
regions in CEN match the lesion regions annotated by the radiologists. This
suggests that providing only image-level prostate cancer classification labels can
also learn the main lesion features, thus substantially reducing the
radiologists' labeling stress. Even though the model predicts some high feature
activation areas in Figure 3, the credibility map of these regions is not
activated, demonstrating that the model is unsure if these regions have PCa.
Therefore, such results need to be rejected. In fact, this is an image of a
healthy prostate that was misclassified, and such errors can be effectively
eliminated by rejecting low-credibility results from the model.
In validation stage, our model yielded optimal results with a
credibility index threshold set at 0.80. In test stage, 280 images of 50 patients in the test
dataset were classified, and the output results were accepted or rejected
according to the calculated credibility index and the set threshold value. As
demonstrated in Table 1, the model mainly dismissed images leading to false
positives. What is even more noteworthy is that there are no false negative samples in the rejected set, which is of great significance for the needs of clinical diagnosis. In the end, there was a
notable improvement in the AUC value of the classification network
incorporating a rejection function, compared to the original VGG-16
classification network (0.93 vs. 0.87, P<0.05). This highlights the enhanced
reliability and efficacy of our innovative method.Conclusions
In conclusion, our explainable credibility estimation network, which includes a rejection option, provides physicians with a comprehensive understanding of the decision-making process and the ability to regulate output. Moreover, the proposed method leverages a credibility analysis technique to discard uncertain predictions, thereby mitigating the potential risks associated with prediction failures. Our proposed model deploys credibility analysis as a means of providing reliable and stable predictions that satisfy the rigorous safety standards of clinical settings.Acknowledgements
No acknowledgement found.References
[1] Rawla, P. (2019). Epidemiology of prostate cancer. World journal of oncology, 10(2), 63.
[2] Reda, I., Khalil, A., Elmogy, M., Abou El-Fetouh, A., Shalaby, A., Abou El-Ghar, M., ... & El-Baz, A. (2018). Deep learning role in early diagnosis of prostate cancer. Technology in cancer research & treatment, 17, 1533034618775530.
[3] Abraham, B., & Nair, M. S. (2019). Automated grading of prostate cancer using convolutional neural network and ordinal class classifier. Informatics in Medicine Unlocked, 17, 100256.
[4] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[5] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626).
[6] Gal, Y., & Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059). PMLR.