Keywords: Diagnosis/Prediction, Radiomics, Explainability, Analysis/Processing, Cancer, Diagnosis/Prediction, Machine Learning/Artificial Intelligence, Prostate, Software Tools
Motivation: Clinical use of computer-aided diagnosis systems for prostate cancer is currently hindered by their internal complexity. Explainability tools can give insight into the functioning of these machine learning (ML) models.
Goal(s): Our goal was to supplement the predictions of an MRI radiomics-based ML model for prostate cancer detection with explanations based on clinical concepts currently used in radiological assessment.
Approach: We clustered correlating MRI radiomics features into groups representing clinical concepts underlying the PI-RADS system. We used SHAP analysis to explain the importance of these concepts in each predicted lesion.
Results: Explainability based on clinical concepts gives insight into ML model predictions.
Impact: Our machine learning pipeline combines accurate prostate cancer detection on MRI with intrinsic explainability, potentially resulting in an easier integration into clinical use.
1 Sunoqrot, M. R. S., Saha, A., Hosseinzadeh, M., Elschot, M. & Huisman, H. Artificial intelligence for prostate MRI: open datasets, available applications, and grand challenges. European radiology experimental 6, 1-13 (2022). https://doi.org:10.1186/s41747-022-00288-8
2 Barentsz, J. O. et al. ESUR prostate MR guidelines 2012. Eur Radiol 22, 746-757 (2012). https://doi.org:10.1007/s00330-011-2377-y
3 Barentsz, J. O. et al. Synopsis of the PI-RADS v2 Guidelines for Multiparametric Prostate Magnetic Resonance Imaging and Recommendations for Use. Eur Urol 69, 41-49 (2015). https://doi.org:10.1016/j.eururo.2015.08.038
4 Sun, Y. et al. Multiparametric MRI and radiomics in prostate cancer: a review. Australas Phys Eng Sci Med 42, 3-25 (2019). https://doi.org:10.1007/s13246-019-00730-z
5 Cuocolo, R. et al. Machine learning applications in prostate cancer magnetic resonance imaging. Eur Radiol Exp 3, 35-38 (2019). https://doi.org:10.1186/s41747-019-0109-2
6 Yasaka, K., Akai, H., Kunimatsu, A., Kiryu, S. & Abe, O. Deep learning with convolutional neural network in radiology. Jpn J Radiol 36, 257-272 (2018). https://doi.org:10.1007/s11604-018-0726-3
7 Nketiah, G. A. et al. Radiomics-based Machine Learning for Predicting Clinically Significant Cancer in Multicenter Cohort: Comparison to PI-RADS Reading. ISMRM & ISMRT Annual Meeting and Exhibition (Toronto (CA), 2023).
8 Lundberg, S. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. (2017).
9 Pyradiomics documentation, <https://pyradiomics.readthedocs.io/en/latest/index.html> (
10 Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health 3, e745-e750 (2021). https://doi.org:10.1016/S2589-7500(21)00208-9
11 Kostick-Quenet, K. M. & Gerke, S. AI in the hands of imperfect users. NPJ Digit Med 5, 197-197 (2022). https://doi.org:10.1038/s41746-022-00737-z
Figure 1: Datasets overview. The 415 cases constituting the training set were used to train the in-house developed ML model7 and for constructing the clinical concept groups. The 200 cases constituting the test set were used to evaluate the performance of the ML model7 and to investigate the explainability of clinical concepts in this study.
Figure 2: Radiomics features extracted for the ML model. All features were extracted using the Pyradiomics toolkit9 with a 2-dimensional 5 x 5 sliding window approach, except Intensity (which represents the raw voxel intensity) and Position features (which were calculated with custom code7).
Figure 3: Absolute correlation matrix for the T2W features computed on the training set. For the HBV and ADC images only 1st order features were computed and correlated. The same 1st order features ended up in the intensity and heterogeneity concepts for all image types. The position features were regarded a separate group.
Figure 4: SHAP plots of TP (1st row), FP (2nd row) and FN predictions (3rd row) for single lesions (1st column) and overall (2nd column). For the TP and FP lesions, the detected lesion mask was used to calculate the SHAP values, while for FN lesions the ground truth mask was used.