David Bonekamp1, Simon Kohl1, Manuel Wiesenfarth1, Patrick Schelb1, Jan-Philipp Radtke2, Michael Götz1, Philipp Kickingereder2, Kaneschka Yaqubi1, Bertram Hitthaler2, Nils Gählert1, Tristan Anselm Kuder1, Fenja Deister1, Martin Freitag1, Markus Hohenfellner2, Boris Hadaschik3, Heinz-Peter Schlemmer1, and Klaus Maier-Hein1
1German Cancer Research Center, Heidelberg, Germany, 2University Hospital Heidelberg, Heidelberg, Germany, 3University Hospital Essen, Essen, Germany
Synopsis
Multiparametric
MRI (mpMRI) has recently seen further standardization by introduction of the
PI-RADS version 2 system. mpMRI/transrectal ultrasound (TRUS)-guided fusion
biopsies have demonstrated ability to closely match the histopathology seen
after radical prostatectomy. Radiomics is a novel approach to extract a large
number of quantitative features from medical imaging and combination with
machine learning has demonstrated potential in the classification of mpMRI of
the prostate. Here, we aim to compare state of the art radiomics and machine
learning with ADC measurements,and prospective radiologist assessment using PI-RADS
version 2 (PIRADSv2) in the evaluation of cancer suspicious lesions of the
prostate.
Purpose
Multiparametric
MRI (mpMRI) has recently seen further standardization by introduction of the
PI-RADS version 2 system [1]. mpMRI/transrectal ultrasound (TRUS)-guided fusion
biopsies can be closely matched with histopathology after radical prostatectomy
[2]. Radiomics is a novel approach to extract a large number of quantitative
features from medical imaging and combination with machine learning has
demonstrated potential in the classification of mpMRI of the prostate [3].
Here, we aim to compare state of the art radiomics and machine learning with
ADC measurements, and prospective radiologist assessments using PI-RADS version
2 (PIRADSv2) in the evaluation of cancer suspicious lesions of the prostate.Materials and Methods
The
institutional review board approved prospective data collection and informed
consent was obtained from all patients. 316 consecutive men with suspected
prostate cancer (PC) were examined with a standard multiparametric MRI (mpMRI)
protocol on a single scanner at 3T prior to MR targeted and extended systematic
biopsy. A biparametric protocol (T2w, DWI b=1500 mm2/s and corresponding ADC
map) was extracted from mpMRI. All lesions mentioned in the PI-RADSv2 clinical
reports by board-certified radiologists (both general body radiologists and
radiologists with specialization in prostate MRI) were manually segmented. Radiomic
random forest machine learning (RRFML) models were trained and validated for
classification of significant PC (sPC, Gleason score >=3+4) on a per-lesion
and per-patient basis and their performance compared to the monoparameter mean ADC
(mADC). The first 183 examinations were used for training, while subsequent 133
scans were used as an independent validation set. Models were compared based on
bootstrapped receiver operating characteristics (ROC) in the training and
standard ROC curves in the validation set. Parametric thresholds of mean ADC
and radiomic models were adjusted to achieve an equally sensitive lesion
detection compared to radiologists in the training set.Results
The
training set included 80 sPC and 163 negative lesions in 157 patients, the
validation set 60 sPC lesions and 159 negative lesions in 121 patients.
Radiologist sensitivity and specificity was 85%/57% in the training and 93%/44%
in the validation set on a per-patient basis. Mean ADC models achieved 88%/67%
in the training and 97%/52% in the validation set. RRFML achieved 85%/62% in
the training and 100%/51% in the validation set. The mADC model reduced false
positive patient (FP) exams by 7 and led to two additional true positive (TP) patient
exams in the validation set in comparison to PI-RADS. The RRFML reduced FP by 6
and TP by 3. In the validation set ROC AUC of mean ADC was 0.84 and of the ensemble
classifier 0.88 (not significant by DeLong’s test, p=0.15).Conclusion
In the
validation set, both mADC and the RRFML model had a higher sensitivity and
specificity compared to PI-RADS and reduced misclassification by 6.8% (9/133). In
the training set, both models showed increased specificity compared to PI-RADS
while sensitivity was calibrated to be similar.
For the decision making task of categorizing lesions based on
biparametric appearance the performance of mADC and the RRFML model was
comparable. Our data support mean ADC as an excellent choice for a highly
decisive monoparameter in the interpretation of prostate MRI, especially when
derived from a single scanner system. The capability of state-of-the art ML methods
to select strong multiparametric signatures is challenged by our explorative
analysis. Our findings stimulate the development of refined ML methods and
their application in larger cohorts.Acknowledgements
PK is a fellow of the Medical
Faculty Heidelberg Postdoc-Program. The study was supported by Stiftung
Krebsforschung Europa.References
1. Weinreb
JC, Barentsz JO, Choyke PL, et al. PI-RADS Prostate Imaging - Reporting and
Data System: 2015, Version 2. Eur Urol. 2016 Jan;69(1):16-40.
2. Radtke
JP, Schwab C, Wolf MB, et al. Multiparametric Magnetic Resonance Imaging (MRI)
and MRI-Transrectal Ultrasound Fusion Biopsy for Index Tumor Detection:
Correlation with Radical Prostatectomy Specimen. European urology.
2016;70(5):846-53.
3. Fehr D, Veeraraghavan H, Wibmer A, et al.
Automatic classification of prostate cancer Gleason scores from multiparametric
magnetic resonance images. Proceedings of the National Academy of Sciences of
the United States of America. 2015;112(46):E6265-73.