2724

Repeatability of Selected Multiparametric Prostate MRI Radiomics Features

Michael Schwier^1,2, Joost van Griethuysen³, Mark G Vangel^2,4, Steve Pieper⁵, Sharon Peled^1,2, Clare M Tempany^1,2, Hugo Aerts^2,6, Ron Kikinis^1,2, Fiona M Fennessy^1,2,6, and Andrey Fedorov^1,2

¹Brigham and Women's Hospital, Boston, MA, United States, ²Harvard Medical School, Boston, MA, United States, ³Netherlands Cancer Institute / Maastricht University, Amsterdam, Netherlands, ⁴Massachusetts General Hospital, Charlestown, MA, United States, ⁵Isomics, Inc., Cambridge, MA, United States, ⁶Dana-Farber Cancer Institute, Boston, MA, United States

Synopsis

In this study we assess the repeatability of selected radiomics features for small prostate tumors in ADC and T2-weighted images. We used a prostate mpMRI test-retest dataset for our evaluation. Different configurations of preprocessing were compared. The intraclass correlation coefficient was employed as a measure of repeatability. Our results show that several of the selected features have good repeatability, however, only when specific preprocessing was applied. Based on our data, texture computation should be done in 2D. Normalization improves repeatability for ADC features, but not in T2-weighted images.

Introduction

Prognostic and discriminative power of radiomics features have been explored in the analysis of cancer imaging.¹ To reliably derive conclusions based on radiomics features, their values must remain stable between two scans, if the condition remains stable. In this study we assess the repeatability of radiomics features in small tumors - specifically those features recommended in recent literature^2,3,4 for quantitative analysis of multiparametric prostate MR images (mpMRI).

Various means of preprocessing the images prior to feature calculation were suggested in the above mentioned literature.^2,3,4 Image normalization/rescaling is applied only for texture features,³ for all features,² or not at all.⁴ 3D computation of texture features is only mentioned in one study,³ while others^2,4 do not specify whether their computations are in 2D or 3D. Overall, the description of the preprocessing often lacks details to allow for exact reproduction of the calculations. Hence, we also evaluate various factors influencing the reliability of feature calculation.

Methods

The study used a previously published prostate mpMRI test-retest dataset composed of fifteen treatment-naïve men with biopsy-confirmed (n=11) or suspected (n=4) prostate cancer (PCa). Patients underwent a second MR within two weeks after the first MR, without any interim treatment.⁵ A radiologist with 10+ years of experience in prostate mpMRI segmented the tumor regions of interest (ROIs) in the baseline and follow-up T2-weighted axial (T2w) images (TR 3350-5109, TE 84-107, FoV 140-200) and Apparent Diffusion Coefficient (ADC) maps derived from Diffusion-weighted MRI (b0-1400, TR 2500-8150, TE 76-80, FoV 160-280). All tumor ROIs used for calculating the features were smaller than 0.8 ml.

We evaluated the following radiomics features in T2w and ADC images, previously shown as informative for PCa classification^2,3,4: First-order intensity features (Mean, Median, 10th Percentile, Skewness, Standard Deviation, Kurtosis) and Haralick texture features (Energy, Entropy, Correlation, Homogeneity, Contrast).

Features were extracted for all ROIs using pyradiomics v1.3.0.⁶ Computations were done for all combinations of normalized (mean=300, standard deviation=100) and original (no normalization applied) images, 2D/3D texture feature computation, as well as bin widths 5, 10, 15, 20 for texture features (bin widths were selected to match recommendations in literature^7,8). No filtering or resampling was applied to the original images.

As measure of repeatability we report the intraclass correlation coefficient ICC(1,1),⁹ as recommended in literature.¹⁰ We use the Volume ICC as reference to allow comparison of the repeatability of the radiomics features with the quantitative measure that is already accepted in the community.

Results

Entropy, Homogeneity, Correlation, Standard Deviation, Median, and Mean in normalized ADC reach ICCs around 0.7 or higher, performing equally or better than Volume (ADC Volume ICC=0.7). Normalization leads to improved ICC in ADC images in most cases (see Fig.1). Exceptions include Energy (ICC reduced from 0.5 to 0.2 after normalization), and Skewness and Kurtosis, which by definition are not influenced by normalization (ICC<0.1). On the contrary, normalization leads to lower ICC in T2w images (see Fig.2). In original T2w Entropy, Energy, Homogeneity, Median, Mean and 10th Percentile reach ICC>0.84 (reference T2w Volume ICC=0.86).

Comparing 2D versus 3D texture computations (Figs.3-4) show that for Homogeneity and Correlation the 2D ICCs are higher than for 3D. For Entropy and Energy the repeatability for 3D and 2D lie in a similar range. Only Contrast has improved ICC under 3D computations: 0.94 vs. 0.51 in T2w, and 0.47 vs. 0.21 in ADC.

The influence of different bin widths on texture features results in variations in the ICC of 0.1-0.2. Exceptions are Correlation on normalized ADC, and Energy on normalized T2w images, both with the ICC spreading over a range of 0.4 (see Figs.1-4).

Discussion and Conclusion

We observed good repeatability (ICC≥Volume ICC) for some of the literature recommended features for prostate cancer (Entropy, Homogeneity, Standard Deviation, Median, Mean), but not for all of them (notable examples include Kurtosis, Contrast, and ADC Energy). However, even features with good repeatability showed these only under specific preprocessing configurations. Based on our data, 2D computation is recommendable as well as normalization for ADC and no normalization for T2w.

Our study is limited to small tumors, but already shows that many factors are influencing the reliability of radiomics features on mpMRI. Since recommendations among recent studies are not consistent, we suggest caution when adopting reported features, and encourage further investigations. We especially advocate reporting all details regarding the preprocessing in radiomics studies, following the consensus definitions of features,¹¹ and strongly recommend making the implementation available. Regarding the latter, our study utilized a publicly available open source radiomics library. We are in the process of making public the dataset used in this study and the calculated radiomics features.

Acknowledgements

Funding support: NIH U01 CA151261, U24 CA180918, P41 EB015898, R01 CA111288, R01 CA160902, U24 CA194354, and U01 CA190234.

References

1. Aerts HJWL, Velazquez ER, Leijenaar RTH, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:1-8. doi:10.1038/ncomms5006.

2. Fehr D, Veeraraghavan H, Wibmer A, et al. Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images. Proc Natl Acad Sci. 2015;112(46):E6265-E6273. doi:10.1073/pnas.1505935112.

3. Wibmer A, Hricak H, Gondo T, et al. Haralick texture analysis of prostate MRI: utility for differentiating non-cancerous prostate from prostate cancer and differentiating prostate cancers with different Gleason scores. Eur Radiol. 2015;25(10):2840-2850. doi:10.1007/s00330-015-3701-8.

4. Peng Y, Jiang Y, Yang C, et al. Quantitative analysis of multiparametric prostate MR images: differentiation between prostate cancer and normal tissue and correlation with Gleason score--a computer-aided diagnosis development study. Radiology. 2013;267(3):787-796. doi:10.1148/radiol.13121454.

5. Fedorov A, Vangel MG, Tempany CM, Fennessy FM. Multiparametric Magnetic Resonance Imaging of the Prostate: Repeatability of Volume and Apparent Diffusion Coefficient Quantification. Invest Radiol. 2017;52(9):538-546. doi:10.1097/RLI.0000000000000382.

6. van Griethuysen JJM, Fedorov A, Parmar CPG, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017;77(21):104-108. doi:10.1158/0008-5472.CAN-17-0339.

7. Tixier F, Hatt M, Le Rest CC, Le Pogam A, Corcos L, Visvikis D. Reproducibility of Tumor Uptake Heterogeneity Characterization Through Textural Feature Analysis in 18F-FDG PET. J Nucl Med. 2012;53(5):693-700. doi:10.2967/jnumed.111.099127.

8. Leijenaar RTH, Nalbantov G, Carvalho S, et al. The effect of SUV discretization in quantitative FDG-PET Radiomics: the need for standardized methodology in tumor texture analysis. Sci Rep. 2015;5(1):11075. doi:10.1038/srep11075.

9. Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull. 1979;86(2):420-428. doi:10.1037/0033-2909.86.2.420.

10. Raunig DL, McShane LM, Pennello G, et al. Quantitative imaging biomarkers: A review of statistical methods for technical performance assessment. Stat Methods Med Res. 2015;24(1):27-67. doi:10.1177/0962280214537344.

11. Zwanenburg A, Leger S, Vallières M, Löck S, Initiative for the IBS. Image biomarker standardisation initiative. 2016;(July). http://arxiv.org/abs/1612.07003.

Figures

Figure 1: ICC for each feature computed in the Tumor ROI on ADC images. Texture features are computed in 2D. Colors represent the bin width for the texture computations, glyph shape represents if the image was normalized or not. Dashed line indicates reference ICC based on Volume. Note that by definition bin width only influences texture features and that Kurtosis and Skewness are by definition also not influenced by normalization. We can see that normalization induces a strong improvement of the ICC for almost all features. The only exception is Energy which performs significantly better on non-normalized images.

Figure 2: ICC for each feature computed in the Tumor ROI on T2w images. Texture features are computed in 2D. Colors represent the bin width for the texture computations, glyph shape represents if the image was normalized or not. Dashed line indicates reference ICC based on Volume. Note that by definition bin width only influences texture features and that Kurtosis and Skewness are by definition also not influenced by normalization. We can see that not normalizing images improves the ICC on almost all features. The only exception is Correlation which has a similar ICC on normalized and original images.

Figure 3: ICC for texture features computed in the Tumor ROI on ADC images. Images were normalized. Colors represent the bin width, glyph shape represents if the features were computed in 2D or 3D. We observe that 3D computations result in lower ICCs for Homogeneity and Correlation (except for bin width 20), but in higher ICCs for Contrast. For Entropy and Energy the ICCs for 2D and 3D are in similar ranges. The Volume reference ICC is matched or surpassed by Homogeneity 2D, Entropy 2D and 3D as well as Correlation 2D with bin widths 5 and 10.

Figure 4: ICC for texture features computed in the Tumor ROI on T2w images. Images were not normalized. Colors represent the bin width, glyph shape represents if the features were computed in 2D or 3D. We observe that 3D computations result in lower ICCs for Homogeneity and Correlation, but higher ICCs for Contrast. For Entropy and Energy the ICCs for 2D and 3D are in similar ranges. The Volume reference ICC is almost matched by Homogeneity 2D and surpassed by Contrast 3D as well as Entropy and Energy 2D and 3D.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)

2724