0268

The impact of radiomic feature reproducibility on a head and neck cancer radiotherapy response model: a comparison of two common analysis packages
James C Korte1,2, Carlos E Cardenas3, Tomas Kron1,4, Nicholas Hardcastle1,5, Jihong Wang3, Houda Bahig6, Baher Elgohari7, Laurence E Court3, Clifton D Fuller7, and Sweet Ping Ng7,8
1Department of Physical Science, Peter MacCallum Cancer Centre, Melbourne, Australia, 2Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia, 3Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, 4Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, Australia, 5Centre for Medical Radiation Physics, University of Wollongong, Wollongong, Australia, 6Radiation Oncology Department, Centre Hospitalier de l'Université de Montréal, Montreal, QC, Canada, 7Department of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, United States, 8Department of Radiation Oncology, Peter MacCallum Cancer Centre, Melbourne, Australia

Synopsis

Radiomics is a promising technique for discovering image based biomarkers of therapy response in cancer. Reproducibility of radiomic features is a known issue that is being addressed by standardisation initiatives, but it remains a challenge to interpret previously published radiomic signatures. We investigate the reproducibility of radiomic features calculated with two common software packages and explore the impact of including non-reproducible diffusion features in a head and neck cancer (HNC) radiotherapy response model. Our results demonstrate that equivalent models can be generated from either software, but only when restricting the model to reproducible features identified with a correlation threshold method.

Introduction

Radiomics1,2 has shown potential in identifying magnetic resonance image based biomarkers for clinical outcome prediction in prostate3,4, breast5, brain6 and head and neck cancers7. Radiomics analysis generates hundreds of image based features, making feature reduction a crucial step to prevent overfitting when developing a radiomics model. Feature space can be reduced by selecting features which are stable under variance due to multiple observer contours/segmentation8 or variance in image acquisition9,10. Validation studies11,12 select a small set of features based on previously reported radiomic signatures from studies using a range of open-source and in-house radiomics software13. The image biomarker standardisation initiative (IBSI) has proposed radiomics guidelines14 which should mitigate known feature reproducibility issues15 in future studies, but feature uncertainty remains a problem when interpreting previously reported radiomic signatures. In this work we investigate the reproducibility of features generated with an open-source radiomics software package used in many published studies (IBEX16) and an emerging open-source tool which follows the IBSI guidelines (PyRadiomics17). We then explore the impact of non-reproducible radiomics features on a head and neck cancer (HNC) radiotherapy response model using apparent diffusion coefficient (ADC) features. Our comparison focuses on diffusion weighted MRI (DWI) of HNC but provides general confidence on which previously reported radiomics features can be reproduced with software that meets the IBSI guidelines14.

Methods

Radiomic features were calculated on 312 ADC maps from the PREDICT-HN prospective trial18 (Figure.1), in which 58 head and neck cancer patients were imaged prior to and throughout radiotherapy. Diffusion weighted images were acquired on a Siemens 1.5T Aera with the BLADE19,20 sequence (voxel size=2 mm, FOV=256x256 mm, axial slices=25, slice thickness= 4 mm, TE=50 ms, TR=5400 ms, FA=90 degrees, b=0,800 s/mm2, ETL=15, pixel bandwidth=1220 Hz). The gross tumour volume (GTV) was contoured on pre-treatment T2 weighted turbo-spin-echo (T2w-TSE) images (voxel size=0.5 mm, FOV=256x256 mm, axial slices=12, slice thickness= 2 mm, TE=80 ms, TR=4800 ms, FA=90 degrees, ETL=15, pixel bandwidth=300 Hz) by a radiation oncologist. Throughout treatment, the pre-treatment contours were adjusted to anatomical boundaries, propagated onto the ADC maps and visually verified.

To determine the relationship between features generated with IBEX and PyRadiomics we performed a linear regression analysis on ADC map features from all patients at all time points. We defined a sub-set of reproducible features as those with a high Pearson correlation coefficient (r >= 0.9). To demonstrate the potential impact of incorporating non-reproducible features in a radiomics model we used unsupervised learning to identify three groups of patients, based on the change in radiomic features between pre-treatment and following a single fraction of radiotherapy. Separate radiomic models were generated based on PyRadiomics and IBEX features, first using all features and then with the sub-set of reproducible features. The patient groups (clusters) were identified with Ward's minimum variance clustering method21 on scaled features (z-score standardisation) with an automatic minimum threshold to generate no more than three clusters.

Common features (n=47) of the open-source software packages (IBEX, PyRadiomics) were identified based on name and calculated with feature extraction settings matched as closely as possible using available documentation. Intensity histogram (IHIST), shape, Grey-level co-occurrence matrix (GLCM22), grey-level run length matrix (GLRLM23) and neighbourhood grey-tone difference matrix (NGTDM24) features were calculated on the original ADC map only. The ADC maps were discretised (256 bins, bin width=16) prior to calculation of IHIST and NGTDM features and with a reduced data range (100 bins, bin width=21) for GLCM and GLRLM features. Non-symmetric GLCM features were calculated at a range of neighbourhoods (1,4,7). Symmetric NGTDM features were calculated with a neighbourhood of three. Texture features (GLCM, GLRLM, NGTDM) were calculated on axial slices, as a volumetric measure is not supported by IBEX. The average of all GLCM and GRLM feature directions was used as PyRadiomics does not report features for individual directions.

Results and Discussion

We observed a mixture of correlation between IBEX and PyRadiomics features (Figure.2) with high correlation for the majority of intensity histogram and shape features and less correlated GLCM, GLRLM and NGTDM features. Reliable features from intensity histogram (5/7), shape (5/8), GLCM (neighbourhood 1:3/16, 4:4/16, 7:0/16), GLRLM (0/11) and NGTDM (2/5) categories were identified (Figure.3). These results validate the known issue of radiomic feature reproducibility between software packages15.

Clustering based on all features generated different patient groups from IBEX and PyRadiomics models (Figure.4), which demonstrates how feature reproducibility issues can negatively affect model reproducibility. IBEX and PyRadiomics models classified patients into identical groups when clustering was based solely on reliable features (Figure.5). This suggests that using a correlation threshold to identify reproducible features is an adequate method to reduce uncertainty when interpreting radiomic signatures from existing studies.

Conclusion

This work highlights feature and model reproducibility issues due to different radiomic analysis software. We propose a correlation threshold method to select reproducible features and demonstrate that the identified features from both software generate an equivalent model. This is relevant for the selection of radiomic features in clinical biomarker validation trials as it provides a framework to assess the reproducibility of radiomic signatures from existing studies. We are currently extending this analysis to include anatomical imaging (T2w-TSE) and to compare against another widely adopted radiomics package (MaZda25).

Acknowledgements

This project is supported by funding from the Peter MacCallum Cancer Foundation.

References

1. Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: extracting more information from medical images using advanced feature analysis. European Journal of Cancer, 2012;48(4):441-446.

2. Kumar V, Gu Y, Basu S, et al. Radiomics: the process and the challenges. Magnetic Resonance Imaging. 2012;30(9):1234-1248.

3. Wibmer A, Hricak H, Gondo T, et al. Haralick texture analysis of prostate MRI: utility for differentiating non-cancerous prostate from prostate cancer and differentiating prostate cancers with different Gleason scores. European Radiology. 2015;25(10):2840-2850.

4. Gnep K, Fargeas A, Gutiérrez‐Carvajal RE, et al. Haralick textural features on T2‐weighted MRI are associated with biochemical recurrence following radiotherapy for peripheral zone prostate cancer. Journal of MRI. 2017;45(1):103-117.

5. Li H, Zhu Y, Burnside ES, et al. MR imaging radiomics signatures for predicting the risk of breast cancer recurrence as given by research versions of MammaPrint, Oncotype DX, and PAM50 gene assays. Radiology. 2016;281(2):382-391.

6. Kickingereder P, Burth S, Wick A, et al. Radiomic profiling of glioblastoma: identifying an imaging predictor of patient survival with improved performance over established clinical and radiologic risk models. Radiology. 2016;280(3):880-889.

7. Jethanandani A, Lin T, Volpe S, et al. Exploring applications of radiomics in magnetic resonance imaging of head and neck cancer: a systematic review. Frontiers in oncology. 2018;8:131.

8. Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications. 2014;5:4006.

9. Leijenaar RT, Carvalho S, Velazquez E, et al. Stability of FDG-PET Radiomics features: an integrated analysis of test-retest and inter-observer variability. Acta Oncologica. 2013;52(7):1391-1397.

10. Zwanenburg A, Leger S, Agolli L, et al. Assessing robustness of radiomic features by image perturbation. Scientific Reports. 2019;9(1):614.

11. Leijenaar RT, Carvalho S, Hoebers FJ, et al. External validation of a prognostic CT-based radiomic signature in oropharyngeal squamous cell carcinoma. Acta Oncologica. 2015;54(9):1423-1429.

12. Lucia F, Visvikis D, Vallières M, et al. External validation of a combined PET and MRI radiomics model for prediction of recurrence in cervical cancer patients treated with chemoradiotherapy. European Journal of Nuclear Medicine and Molecular Imaging. 2019;46(4):864-877.

13. Nie K, Al-Hallaq H, Li XA, et al. NCTN assessment on current applications of Radiomics in oncology. International Journal of Radiation Oncology Biology Physics. 2019;104(2):303-315

14. Zwanenburg A, Leger S, Vallières M, et al. Image biomarker standardisation initiative. arXiv. 2016;1612.07003.

15. Foy JJ, Robinson KR, Li H, et al. Variation in algorithm implementation across radiomics software. Journal of Medical Imaging. 2018;5(4);044505.

16. Zhang L, Fried DV, Fave XJ, et al. IBEX: an open infrastructure software platform to facilitate collaborative work in radiomics. Medical Physics. 2015;42(3):1341-1353.

17. Van Griethuysen JJ, Fedorov A, Parmar C, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Research. 2017;77(21):e104-e107.

18. Ng SP, Bahig H, Wang J, et al. Predicting treatment Response based on Dual assessment of magnetic resonance Imaging kinetics and Circulating Tumor cells in patients with Head and Neck cancer (PREDICT-HN): matching ‘liquid biopsy’and quantitative tumor modeling. BMC cancer. 2018;18(1):903.

19. Pipe JG. Motion correction with PROPELLER MRI: application to head motion and free‐breathing cardiac imaging. Magnetic Resonance in Medicine. 1999;42(5):963-969.

20. Wintersperger BJ, Runge VM, Biswas J, et al. Brain magnetic resonance imaging at 3 Tesla using BLADE compared with standard rectilinear data sampling. Investigative Radiology. 2016;41(7):586-592.

21. Ward Jr JH. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association. 1963;58(301):236-244.

22. Haralick RM, Shanmugam K, Dinstein IH. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics. 1973;6:610-621.

23. Galloway, M.M. Texture Analysis Using Gray Level Run Lengths. Computer Graphics and Image Processing. 1975;4:172-179.

24. Amadasun M, King R. Textural features corresponding to textural properties. IEEE Transactions on Systems, Man, and Cybernetics. 1989;19(5);1264-1274.

25. Szczypiński PM, Strzelecki M, Materka A, et al. MaZda—a software package for image texture analysis. Computer methods and programs in biomedicine. 2009;94(1):66-76.

Figures

Figure 1: Apparent diffusion coefficient (ADC) maps of a head and neck cancer patient throughout radiotherapy from the PREDICT-HN prospective clinical trial18. ADC maps (top row) are displayed with the gross tumour volume (GTV) highlighted in colour. ADC maps cropped to the GTV (middle row) focus on the region of interest of our analysis. Change in the ADC histogram (bottom) within the GTV is observed from baseline (TP0) and throughout radiotherapy (TP1-TP7), with the histogram colour matched to the GTV colour.

Figure 2: Linear correlation of apparent diffusion coefficient (ADC) radiomic features between IBEX and PyRadiomics software. Correlation matrices are grouped by feature class (Intensity histogram, Shape, NGTDM, GLCM, GLRLM) with colour representing the Pearson correlation coefficient (r). An ideal correlation matrix would have diagonal elements as highly correlated features (r=1.0, dark purple) and the remaining elements as uncorrelated features (r=0.0, white).

Figure 3: Summary of linear correlation of apparent diffusion coefficient (ADC) radiomic features between IBEX and PyRadiomics software. The reproducibility threshold (red-dashed line) is defined as greater than or equal to a Pearson correlation coefficient of 0.9. This analysis identifies a sub-set of reproducible features from intensity histogram (5/7), shape (5/8), GLCM (neighbourhood 1:3/16, 4:4/16, 7:0/16), GLRLM (0/11) and NGTDM (2/5) categories.

Figure 4: Clustered radiomic signatures (left) of change in apparent diffusion coefficient (ADC) features after one fraction of radiotherapy in 38 head and neck cancer patients. Analysis performed with PyRadiomics (top) or IBEX (bottom) features. Patient groups generated using unsupervised hierarchical clustering (right) based on all features. Due to non-reproducible features used in the clustering, there is a difference in the model generated using PyRadiomics or IBEX features.

Figure 5: Clustered radiomic signatures (left) of change in apparent diffusion coefficient (ADC) features after one fraction of radiotherapy in 38 head and neck cancer patients. Analysis performed with PyRadiomics (top) or IBEX (bottom) features. Patient groups generated using unsupervised hierarchical clustering (right) based on only the identified reproducible features (Figure.3). Due to only reproducible features used in the clustering, equivalent patient groups are generated from PyRadiomics or IBEX features.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
0268