2574

Comparison of Radiomics and Quantitative ADC Measurements of Prostate PI-RADS v2 Lesions to Prospective Radiologist Performance
David Bonekamp1, Simon Kohl1, Manuel Wiesenfarth1, Patrick Schelb1, Jan-Philipp Radtke2, Michael Götz1, Philipp Kickingereder2, Kaneschka Yaqubi1, Bertram Hitthaler2, Nils Gählert1, Tristan Anselm Kuder1, Fenja Deister1, Martin Freitag1, Markus Hohenfellner2, Boris Hadaschik3, Heinz-Peter Schlemmer1, and Klaus Maier-Hein1

1German Cancer Research Center, Heidelberg, Germany, 2University Hospital Heidelberg, Heidelberg, Germany, 3University Hospital Essen, Essen, Germany

Synopsis

Multiparametric MRI (mpMRI) has recently seen further standardization by introduction of the PI-RADS version 2 system. mpMRI/transrectal ultrasound (TRUS)-guided fusion biopsies have demonstrated ability to closely match the histopathology seen after radical prostatectomy. Radiomics is a novel approach to extract a large number of quantitative features from medical imaging and combination with machine learning has demonstrated potential in the classification of mpMRI of the prostate. Here, we aim to compare state of the art radiomics and machine learning with ADC measurements,and prospective radiologist assessment using PI-RADS version 2 (PIRADSv2) in the evaluation of cancer suspicious lesions of the prostate.

Purpose

Multiparametric MRI (mpMRI) has recently seen further standardization by introduction of the PI-RADS version 2 system [1]. mpMRI/transrectal ultrasound (TRUS)-guided fusion biopsies can be closely matched with histopathology after radical prostatectomy [2]. Radiomics is a novel approach to extract a large number of quantitative features from medical imaging and combination with machine learning has demonstrated potential in the classification of mpMRI of the prostate [3]. Here, we aim to compare state of the art radiomics and machine learning with ADC measurements, and prospective radiologist assessments using PI-RADS version 2 (PIRADSv2) in the evaluation of cancer suspicious lesions of the prostate.

Materials and Methods

The institutional review board approved prospective data collection and informed consent was obtained from all patients. 316 consecutive men with suspected prostate cancer (PC) were examined with a standard multiparametric MRI (mpMRI) protocol on a single scanner at 3T prior to MR targeted and extended systematic biopsy. A biparametric protocol (T2w, DWI b=1500 mm2/s and corresponding ADC map) was extracted from mpMRI. All lesions mentioned in the PI-RADSv2 clinical reports by board-certified radiologists (both general body radiologists and radiologists with specialization in prostate MRI) were manually segmented. Radiomic random forest machine learning (RRFML) models were trained and validated for classification of significant PC (sPC, Gleason score >=3+4) on a per-lesion and per-patient basis and their performance compared to the monoparameter mean ADC (mADC). The first 183 examinations were used for training, while subsequent 133 scans were used as an independent validation set. Models were compared based on bootstrapped receiver operating characteristics (ROC) in the training and standard ROC curves in the validation set. Parametric thresholds of mean ADC and radiomic models were adjusted to achieve an equally sensitive lesion detection compared to radiologists in the training set.

Results

The training set included 80 sPC and 163 negative lesions in 157 patients, the validation set 60 sPC lesions and 159 negative lesions in 121 patients. Radiologist sensitivity and specificity was 85%/57% in the training and 93%/44% in the validation set on a per-patient basis. Mean ADC models achieved 88%/67% in the training and 97%/52% in the validation set. RRFML achieved 85%/62% in the training and 100%/51% in the validation set. The mADC model reduced false positive patient (FP) exams by 7 and led to two additional true positive (TP) patient exams in the validation set in comparison to PI-RADS. The RRFML reduced FP by 6 and TP by 3. In the validation set ROC AUC of mean ADC was 0.84 and of the ensemble classifier 0.88 (not significant by DeLong’s test, p=0.15).

Conclusion

In the validation set, both mADC and the RRFML model had a higher sensitivity and specificity compared to PI-RADS and reduced misclassification by 6.8% (9/133). In the training set, both models showed increased specificity compared to PI-RADS while sensitivity was calibrated to be similar. For the decision making task of categorizing lesions based on biparametric appearance the performance of mADC and the RRFML model was comparable. Our data support mean ADC as an excellent choice for a highly decisive monoparameter in the interpretation of prostate MRI, especially when derived from a single scanner system. The capability of state-of-the art ML methods to select strong multiparametric signatures is challenged by our explorative analysis. Our findings stimulate the development of refined ML methods and their application in larger cohorts.

Acknowledgements

PK is a fellow of the Medical Faculty Heidelberg Postdoc-Program. The study was supported by Stiftung Krebsforschung Europa.

References

1. Weinreb JC, Barentsz JO, Choyke PL, et al. PI-RADS Prostate Imaging - Reporting and Data System: 2015, Version 2. Eur Urol. 2016 Jan;69(1):16-40.

2. Radtke JP, Schwab C, Wolf MB, et al. Multiparametric Magnetic Resonance Imaging (MRI) and MRI-Transrectal Ultrasound Fusion Biopsy for Index Tumor Detection: Correlation with Radical Prostatectomy Specimen. European urology. 2016;70(5):846-53.

3. Fehr D, Veeraraghavan H, Wibmer A, et al. Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(46):E6265-73.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
2574