4118

Feature Engineering for the Subtype Classification of Breast Cancer: A Model Incorporating DCE and DWI Images
Zhe Wang1 and Boyu Zhang2

1Shanghai Center for Mathematical Sciences, Shanghai, China, 2ISTBI, Shanghai, China

Synopsis

For the 4-IHC classification task, the best accuracy of 78.4% was achieved based on linear discriminant analysis (LDA) or subspace discrimination of assembled learning in conjunction with 25 selected features, and only small dependent emphasis of Kendall-tau-b for sequential features based on the DWI images (DWIsequential) with the LDA model yielding an accuracy of 53.7%. The subspace discriminant of ensembled learning using eight features yielded the highest accuracy of 91.8% for comparing TN to non-TN cancers, and the maximum variance for DWIsequential alone together with a linear support vector machine (SVM) model achieved an accuracy of 83.6%.

Purpose:

To investigate whether feature engineering of multiparametric MR radiomics can help classify the immunohistochemical (IHC) subtypes of breast cancer.

Experimental Design:

One hundred and thirty-four consecutive patients with pathologically-proven invasive ductal carcinoma were retrospectively analyzed. A total of 2788 features were extracted from the DCE- and DWI-related images. We proposed a novel two-stage feature selection method combining traditional statistics and machine learning-based methods. The accuracies of 4-IHC classification and triple negative (TN) versus non-TN cancers was assessed.

Results:

For the 4-IHC classification task, the best accuracy of 78.4% was achieved based on linear discriminant analysis (LDA) or subspace discrimination of assembled learning in conjunction with 25 selected features, and only small dependent emphasis of Kendall-tau-b for sequential features based on the DWI images (DWIsequential) with the LDA model yielding an accuracy of 53.7%. The subspace discriminant of ensembled learning using eight features yielded the highest accuracy of 91.8% for comparing TN to non-TN cancers, and the maximum variance for DWIsequential alone together with a linear support vector machine (SVM) model achieved an accuracy of 83.6%.

Conclusions:

Whole-tumor radiomics on MR multiparametric images provide a non-invasive analytical approach for breast cancer subtype classification and TN cancer identification.

Introduction

Breast cancer is a heterogeneous group of diseases with varied clinical behavior, treatment responses, and survival outcomes (1,2). Immunohistochemical (IHC) subtypes, including Luminal A cancer, Luminal B cancer, human epidermal growth factor receptor 2 (HER2)-positive cancer, and triple negative (TN) cancer, are routinely employed to select therapy and predict the therapeutic response (3). For example, HER2-positive breast cancers are more likely to have a pathologic complete response (pCR) to neoadjuvant chemotherapy, whereas lower pCR rates are demonstrated in luminal type breast cancers (4,5). Patients with TN breast cancer have a poorer clinical outcome than patients with other subtypes (6-8). Multiparametric MR imaging using dynamic contrast-enhanced (DCE) imaging and diffusion-weighted imaging (DWI) has been shown to provide important information for the subtype differentiation of breast cancer (9-11). However, there is much quantitative information about the tumor from thousands of images which is imperceptible to the doctors’ visual systems. Radiomics refers to computational algorithms used to evaluate and make predictions on the imaging texture features (12). Feature engineering, a process of selecting informative features to boost the machine learning model performance, together with machine learning, further help to identify the subtypes of breast cancer (13-17). Agner et al. proposed feature selection by using linear discriminant analysis (LDA) and support vector machine (SVM) classifiers to differentiate TN breast cancer from non-TN lesions on DCE images (18). In addition, Vidic et al. noted that texture analysis of DWI images together with SVM has the potential for the subtype classification of breast cancer (19). However, no studies to date have attempted to investigate the radiomics of DCE imaging or DWI in the subtype classification of breast cancer. The purpose of this study was to evaluate the performance of the feature engineering-based radiomics model to differentiate among Luminal A cancer, Luminal B cancer, HER2-positive cancer, and TN breast cancer using DCE imaging and DWI. In the second half of the study, we investigated whether the model enhanced the ability to differentiate the subtype of the worst clinical outcome (TN breast cancer) from other subtypes.

Acknowledgements

We recognize Dr. Chao You, Dr. Tong Tong and Dr. Bin Wu for their discussions of the study design and research results. This work was supported by the National Natural Science Foundation of China (no. 61731008). This project has also been funded by Shanghai Municipal Science and Technology Major Project (no. 2017SHZDZX01) and Shanghai Natural Science Foundation (no. 17ZR1401600).

References

1. Henderson IC, Patek AJ. The relationship between prognostic and predictive factors in the management of breast cancer. Breast Cancer Research and Treatment 1998;52(1-3):261-88.

2. Martelotto LG, Ng CK, Piscuoglio S, Weigelt B, Reis-Filho JS. Breast cancer intra-tumor heterogeneity. Breast Cancer Res 2014;16(3):210.

3. Goldhirsch A, Wood WC, Coates AS, Gelber RD, Thurlimann B, Senn HJ, et al. Strategies for subtypes--dealing with the diversity of breast cancer: highlights of the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol 2011;22(8):1736-47.

4. Bhargava R, Beriwal S, Dabbs DJ, Ozbek U, Soran A, Johnson RR, et al. Immunohistochemical surrogate markers of breast cancer molecular classes predicts response to neoadjuvant chemotherapy: a single institutional experience with 359 cases. Cancer 2010;116(6):1431-9.

5. Zambetti M, Mansutti M, Gomez P, Lluch A, Dittrich C, Zamagni C, et al. Pathological complete response rates following different neoadjuvant chemotherapy regimens for operable breast cancer according to ER status, in two parallel, randomized phase II trials with an adaptive study design (ECTO II). Breast Cancer Res Treat 2012;132(3):843-51.

6. Cleator S, Heller W, Coombes RC. Triple-negative breast cancer: therapeutic options. Lancet Oncol 2007;8(3):235-44.7. Liedtke C, Mazouni C, Hess KR, Andre F, Tordai A, Mejia JA, et al. Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer. J Clin Oncol 2008;26(8):1275-81.

Figures

The accepted domains for coarse feature selection. A, For the 4-immunohistochemical (IHC) classification, the p-value for the analysis of variance (ANOVA) and cross-validation error were set to 0.6 and 0.85, respectively. B, For the triple negative vs. non-triple negative cancers, the t-test p-value and cross-validation error were set to 0.54 and 0.76, respectively.

The flowchart for feature engineering of multiparametric MR radiomics. A, The whole-tumor segmentations from a total of ten sequence images were executed. B, A total of 2788 features were extracted, and a two-stage feature selection method was subsequently performed. C, Machine learning-based classifiers were used for the 4-immunohistochemical (IHC) classification and triple negative vs. non-triple negative cancers.

Figure 3. DWI images for the four subtypes of breast cancer. A 46-year-old female with Luminal A breast cancer (A, B, C); a 60-year-old female with Luminal B breast cancer (D, E, F); a 47-year-old female with human epidermal growth factor receptor 2 (HER2)-positive breast cancer (G, H, I); and a 57-year-old female with tripe negative (TN) breast cancer (J, K, L). The small dependence emphasis of Kendall-tau-b for DWIsequential for Luminal A, Luminal B, HER2-positive, and TN breast cancer were -0.915, 0.358, 0.915, and 0.299, respectively.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)
4118