2326

Improvement of Radiomics Prediction by Robustness Preselection 
Renee Cattell1, Shenglan Chen2, Jie Ding2,3, and Chuan Huang2,4,5
1Stony Brook University, Stony Brook, NY, United States, 2Biomedical Engineering, Stony Brook University, Stony Brook, NY, United States, 3Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, Hong Kong, 4Radiology, Stony Brook University, Stony Brook, NY, United States, 5Psychiatry, Stony Brook University, Stony Brook, NY, United States

Synopsis

Radiomic analysis has exponentially increased the amount of quantitative data extractable from a single medical image. However, the effect of various image acquisition conditions on the reproducibility or robustness of these features is understudied. Specifically, when generating a predictive model to be used in a multi-institutional setting, it must be robust to voxel size changes. This study aims to develop a task-specific robustness preselection step for incorporation into radiomics pipeline to improve the generalizability of a model applied to a testing set of dissimilar resolution.

Introduction

The amount of quantitative data that can be extracted from a single medical image has exponentially increased with the use of radiomic analysis[1]. Imaging biomarkers can aid in the generation of prediction models for disease diagnosis, characterization and prognosis[2, 3]. However, generalizability of the models is dependent on the robustness of these features. This study aims develop a task-specific robustness preselection step to improve the generalizability of radiomics model for prediction of sentinel lymph node status in breast cancer patients.

Methods

Phantom Study: A phantom study[4] (Fig. 1) was performed to divide radiomic features (histogram, gray level cooccurrence matrix (GLCM)[5], gray level run length matrix (GLRLM)[6], neighborhood gray level different matrix (NGLDM) [7], gray level zone length matrix (GLZLM)[8], Laws [9]) into robustness groups relative to voxel size differences. Robustness was assessed by intraclass correlation coefficient analysis (ICC values: low (<0.5), moderate (0.5-0.9), and high (>0.9)).

Sentinel Lymph Node Status Prediction: 212 patients with breast cancer undergoing primary surgery were analyzed in this portion of the study. From the standard of care dynamic contrast enhancement sequence, wash-in, wash-out and signal enhancement ratio maps were generated; in-breast tumor was manually segmented by a radiologist with 11 years of experience. Similar to the phantom study, radiomic features were calculated for each map. Additionally, clinical data was collected including location of tumor, multifocality, age, pathological type and grade, molecular subtype and lymphovascular invasion. These patients were separated into training set (n=109, 37 positive(+) for SLN metastasis), testing set (n=54, 18+) and an additional testing set (n=48, 13+). The training set and testing set had in-plane resolution of 0.70mm (referred to as Testing0.70), whereas the additional testing set had in-plane resolution of 0.78-1.0mm (referred to as Testing0.78-1.0). A predictive model was generated using radiomic and clinical data using least absolute shrinkage and selection operator (LASSO) with and without a step for removal of the low-moderately robust features to voxel size differences. Predictive performance was evaluated using receiver operating curve (ROC) analysis. Specifically, the metrics of most interest for this study include specificity, negative predictive value and accuracy. In order to assess stability and reproducibility of the model, the predictive model was run 100 times with different randomization seeds and the most commonly chosen features are reported.

Results

Phantom Study: Fig. 2 and Fig. 3 summarize the result of ICC analysis. All of GLCM, GLRLM, GLZLM features were found to be highly robust. Only one of first order and one of NGLDM features were found to be of less than high robustness. For Laws features, a majority were found to be of low and moderate robustness (27 out of 56 and 16 out of 56, respectively).

Sentinel Lymph Node Status Prediction: Prior to robustness preselection step, the specificity and accuracy of Testing0.78-1.0 was poorer than Testing0.70, (specificity: 86% vs 66%, accuracy: 82% vs 71%). However, the negative predictive value was higher for Testing0.78-1.0 compared to Testing0.70. Accuracy and specificity of Testing0.78-1.0 improved after preselection based on robustness assessment. Results are summarized in Fig. 4 and Fig. 5. In Testing0.78-1.0, specificity increased from 66% to 71%, and accuracy increased from 71% to 75%.

Discussion

Voxel size can vary based on different sizes of patients or across different institutions based on their imaging protocols. The effect of small voxel size variations, and its relation to radiomic feature robustness, is understudied. This study demonstrates that a model trained and tested on images of a particular voxel size performs worse in a test set comprised of different resolution. Furthermore, it indicates that removal of features deemed to be sensitive to voxel size variation improved performance.

Conclusion

Removal of radiomic features that are sensitive to voxel size variation results in a more generalizable and accurate predictive model for additional testing set with different voxel size compared to the training set. This finding could aid in the incorporation of radiomic tools for use in multi-institutional settings.

Acknowledgements

This work is in part funded by National Institutes of Health (R03CA223052), Walk-for-Beauty Foundation and Carol M. Baldwin Breast Cancer Research Foundation.

References

1. Rizzo, S., et al., Radiomics: the facts and the challenges of image analysis.Eur Radiol Exp, 2018. 2(1): p. 36.

2. Crivelli, P., et al., A New Challenge for Radiologists: Radiomics in Breast Cancer.Biomed Res Int, 2018. 2018: p. 6120703.

3. Valdora, F., et al., Rapid review: radiomics and breast cancer.Breast Cancer Res Treat, 2018. 169(2): p. 217-229.

4. Cattell, R.F., et al., Robustness of Radiomic Features in MRI: Review and A Phantom Study. Visual Computing for Industry, Biomedicine, and Art, 2019.

5. Haralick, R.M. and Shanmugam, K., Textural features for image classification.IEEE Transactions on systems, man, and cybernetics, 1973(6): p. 610-621.

6. Galloway, M.M., Texture analysis using grey level run lengths.NASA STI/Recon Technical Report N, 1974. 75.

7. Sun, C.J., et al., Neighboring Gray Level Dependence Matrix for Texture Classification.Computer Vision Graphics and Image Processing, 1983. 23(3): p. 341-352.

8. Thibault, G., et al., Shape and texture indexes application to cell nuclei classification.International Journal of Pattern Recognition and Artificial Intelligence, 2013. 27(01): p. 1357002.

9. Laws, K., Rapid texture identification. In SPIE Vol. 238 Image Processing for Missile Guidance.1980.

Figures

Figure 1 Image of regions of interest used as phantom for this study, namely pineapple core (red), banana (blue), orange (orange) and kiwi (green).

Figure 2 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to voxel size differences.The denominator in the table signifies the total number of features in the feature class. GLCM: gray level cooccurrence matrix, GLRLM: gray level run length matrix, NGLDM: neighborhood gray level different matrix, GLZLM: gray level zone length matrix, ICC: Intraclass correlation coefficient.

Figure 3 Average intraclass correlation (ICC) over 10 noise realizations of first order, gray level texture and filter-based features.

Figure 4 Summary of predictive performance before and after robustness preselection reduction step. NPV: negative predictive value.

Figure 5 Pictorial representation of performance results from Figure 4 for prediction of sentinel lymph node status for the testing set of the same resolution as training set, and the dissimilar resolution testing set before and after robustness preselection step.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
2326