3503

The Value of Combined Clinical-Radiomics-Deep LearningModels for Prediction Gleason Grade Group

Xiaomeng Qiao¹, Chenhan Hu¹, Jie Bao¹, Ximing Wang¹, and Yang Song²
¹The First Affiliated Hospital of Soochow University, Suzhou, China, ²Siemens Healthineers Ltd., Suzhou, China

Synopsis

Keywords: Prostate, Prostate, radiomics, deep learning, Gleason score

Motivation: Gleason Score (GS) could only be obtained through biopsy or radical prostatectomy (RP), which might carry a multitude of complications and pose additional financial burdens and emotional strain.

Goal(s): To explore the predictive value of mixed model combined clinical features, radiomics features and deep learning features for GS.

Approach: The mixed model was constructed to classify grade group 0 (GG0) (benign), GG1, GG2, GG3, GG4 and GG5. DenseNet was used to establish the model.

Results: The mixed model had the best predictive ability, with Kw of 0.74 and relative accuracy of 0.76.

Impact: Clinicians could obtain GS without biopsy or surgery, which could avoid a lot of complications and financial burdens. Future studies could integrate automated VOI segmentation algorithm to optimize AI model.

Introduction

Gleason Score (GS) could only be obtained through biopsy or radical prostatectomy (RP), both invasive procedures that pose a range of complications for patients. Additionally, these procedures could pose extra costs and emotional stress. Furthermore, there is a significant waiting period for pathological results, and the score varied among different pathologists based on their experience. Our aim is to explore the predictive value of clinical model, radiomics model and deep learning model for Gleason grade group and to built mixed model combined clinical features, radiomics features and deep learning features.

Methods

A total of 885 patients who underwent 3.0 T magnetic resonance imaging(MRI) examination and confirmed prostate diseases by pathology from January 2016 to March 2022 at the First Affiliated Hospital of Soochow University were collected retrospectively. The patients were randomly divided into a training group (n=708) and a testing group (n=177) in a 7:3 ratio. The multiple models were constructed to classify grade group 0 (GG0) (benign), GG1, GG2, GG3, GG4 and GG5. Multivariate logistic regression was used to establish the clinical model, support vector machine was used to build the radiomics model, and DenseNet was used to establish the deep learning model. The mixed models combined radiomics features and deep learning features (DR model) and clinical features, radiomics features and deep learning features (DRC model) were established by deep learning network. The predictive performance of each model for Gleason grade group was evaluated, and the diagnostic performance of the mixed model for clinically significant prostate cancer (csPCa) was evaluated.

Results

The clinical model, radiomics model and deep learning model demonstrated good predictive performance for Gleason grade group. In the testing group, the quadratic weighted Kappa values (Kw) were 0.64, 0.65 and 0.63, respectively, and the relative accuracies within an error range of 1 were 0.67, 0.66 and 0.72, respectively. The predictive performance of the radiomics model was comparable to that of the deep learning model.The DR model showed improved predictive performance, with Kw of 0.71 and relative accuracy of 0.76. The DRC model had the best predictive ability, with Kw of 0.74 and relative accuracy of 0.76. Moreover, the area under the curve (AUC) of DRC model for diagnosing csPCa was comparable to that of Prostate Imaging Reporting and Data System(PI-RADS) by experienced radiologists (0.90 vs. 0.86, P=0.23), and the specificity was significantly higher than that of radiologists (79% vs. 54%, P<0.001).

Discussion

This study further established a mixed model integrating clinical variables, radiomics features, and deep learning features. The Kw of the mixed model reached 0.74 in internal testing, significantly surpassing the results of previous studies by Vente et al and Hu et al. The average AUC in the test group was 0.77, similar to the model results of Bao et al. The mixed model outperformed individual clinical, radiomics, and deep learning models. Additionally, compared to the radiomics-deep learning model, the DRC model showed further improvement in Kw values. Clinical features, radiomic features, and deep learning features explained the heterogeneity of PCa from different dimensions. Clinical features leaned towards tumor biology and macroscopic characteristics, radiomic features focused on microscopic morphology and grayscale characteristics, while deep learning supplemented tumor microenvironment features. These three aspects had different yet complementary values in predicting tumor invasiveness, delineating differences among various groups. Furthermore, the AUC of the fusion model for diagnosing csPCa was significantly higher than that of PSA (P<0.05) and slightly higher than experienced senior physicians in prostate MRI diagnosis (P=0.23). Additionally, the specificity was markedly higher than PSA and senior physicians. This suggests that, during initial diagnosis, the DRC model can reduce unnecessary biopsies for more patients. Simultaneously, the DRC model can predict Gleason grading, providing clinicians with more detailed guidance for treatment decisions and offering additional long-term prognosis information for patients.

Conclusion

Clinical model, radiomics model and deep learning model had good predictive value for Gleason grade group. The mixed model based on clinical features, radiomics features and deep learning features had the best predictive performance and might offer a potential alternative method to non-invasively predict Gleason grade group.

Acknowledgements

The authors thank all those who helped us during the writing of this research. We also thank the Department of Ultrasound, Urology and Pathology of our hospital for their valuable help and feedback.

References

[1] Siegel R L, Miller K D, Fuchs H E, et al. Cancer statistics, 2021[J]. CA Cancer J Clin, 2021, 71(1): 7-33.

[2] Sung H, Ferlay J, Siegel R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3): 209-249.

[3] Epstein J I. An update of the Gleason grading system[J]. J Urol, 2010, 183(2): 433-440.

[4] Epstein J I, Egevad L, Amin M B, et al. The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma definition of grading patterns and proposal for a new grading system[J]. Am J Surg Pathol, 2016, 40(2): 244-252.

[5] Loeb S, Bjurlin M A, Nicholson J, et al. Overdiagnosis and overtreatment of prostate cancer[J]. Eur Urol, 2014, 65(6): 1046-1055.

Figures

Figure 1: Model Development Process for Predicting Gleason Grade Grouping. PSA: Prostate-Specific Antigen; PI-RADS: Prostate Imaging Reporting and Data System; LASSO: Least Absolute Shrinkage and Selection Operator; SVM: Support Vector Machine; DR Model: Fusion Model of Deep Learning Features and Radiomics Features; DRC Model: Fusion Model of Deep Learning Features, Radiomics Features, and Clinical Features.

Figure 2: Confusion Matrix of the DRC Model in the Testing Group and ROC Curve of the DRC Model in the Test Group. The correct prediction rate is 45%, the error rate for predictions with a difference of 1 is 31%, the upgrade rate is 15%, the downgrade rate is 9%, and the proportion of predictions with an error ≤ 1 is 76%.

Figure 3: ROC curve comparisons of DRC Model, PSA, and PI-RADS for csPCa. PSA: Prostate-Specific Antigen; PI-RADS: Prostate Imaging Reporting and Data System; DRC Model: Fusion Model of Clinical Features, Radiomics Features, and Deep Learning Features.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

3503

DOI: https://doi.org/10.58530/2024/3503