2390

Augmented ensemble learning is effective strategy for imbalanced small dataset: improve differentiation of low from high grade prostate cancer

Yuta Akamine¹, Yoshiko Ueno², Keitaro Sofue², Takamichi Murakami², Yu Ueda¹, Ahsan Budrul¹, Masami Yoneyama¹, Makoto Obara¹, and Marc Van Cauteren³
¹Philips Japan, Tokyo, Japan, ²Department of Radiology, Kobe University Graduate School of Medicine, Hyogo, Japan, ³Asia Pacific, Philips Healthcare, Tokyo, Japan

Synopsis

Machine learning (ML) techniques have gained more attention to distinguish low from high grade prostate cancer. However, obtaining big training data is difficult. Moreover, ML models created by imbalanced dataset have a high accuracy for majority, but a low accuracy for minority. For this problem, data augmentation is widely studied. Recently, ensemble learning, which merges different classifiers, has shown great potential. Combinations of data augmentation and ensemble learning were investigated, using multi-parametric MR. We demonstrated that synthetic-minority-over-sampling-technique (SMOTE) with ensemble learning showed increased F1 (0.831) and AUC (0.762) and is effective strategy to improve diagnosis performance for imbalanced small dataset.

INTRODUCTION

　It is clinically important to distinguish low grade (Gleason score (GS) ≦ 3+4) from high grade (GS ≧ 4+3) prostate cancer (PCa) because its prognosis greatly differs.¹ For differentiation of low from high grade PCa, multi-parametric magnetic resonance imaging (mp-MRI) which combines diffusion-weighted image (DWI) and dynamic contrast-enhanced MRI (DCE-MRI) has been studied.² However, because studies differ in scan protocols and analysis, no consensus about diagnostic criteria has been made.
　Recently, machine learning (ML) techniques have gained more attention.³ However, obtaining big training dataset is difficult in medical industry.⁴ Moreover, because occurrence of diseases/low grade tumor is much less than healthy cases/high grade tumor, imbalanced data frequently appears.⁵ ML models created by imbalanced dataset will result in a high prediction accuracy for the majority, but a very low accuracy for the minority. To deal with the imbalance class problem, synthetic-minority-over-sampling-technique (SMOTE)⁶ and random-over-sampling-examples (ROSE)⁷ are widely used data augmentation methods applied to numerical data.⁸
　As a new approach, stacking ensemble learning techniques, ML processes merging different classifiers to improve model generalization capability, have shown great potential.^9-10In this study, we show how a combination of data augmentation and ensemble learning can improve the prediction accuracy to differentiate low from high grade PCa, trained from an imbalanced small dataset.

METHODS

Subject and equipment: The retrospective study was approved by the hospital review board and informed consent was waived. 39 patients underwent preoperative MRI using a 3.0-T MR scanner (Achieva, Philips) between September 2012 and December 2013. Regions-of-interest (ROIs) were placed on 15 low grade and 25 high grade PCa in peripheral zone (PZ), based on histopathology specimen. The ratio of high grade (majority) to low grade PCa (minority) was 1.67. We think the dataset was relatively small and imbalanced.

DWI and DCE-MRI: T2W, DWI, and DCE-MRI were obtained.^11-12Sequence parameters are summarized (Table 1A). Intravoxel incoherent motion (IVIM), diffusion kurtosis imaging (DKI), and permeability analyses were conducted. IVIM parameters (D, D*, and F), DKI (K), SNR, and permeability parameters (K^trans, K_ep, and V_e), and respective models are summarized (Table 1B). Mean within ROI was calculated for each parameter.

Data augmentation: The SMOTE algorithm generates synthetic examples through a linear interpolation between two existing minority examples (Fig 1A). The synthetic example x_new is generated by X_new = X + rand (0,1) * (X^~ - X) (1) , where rand (0,1) is a random value in (0,1). X^~ is a random example of K-nearest minority class (Fig 1B). The number of K-nearest neighbors was set to five.⁸
ROSE is a smoothed bootstrap-based technique to generate new examples from the minority class in its neighborhood, where neighborhood is determined by the shape of the contour of the kernel and its width is governed by the covariance matrix.⁷ SMOTE and ROSE can also randomly under sample the majority class to match the number of minority class.

Ensemble learning: The stacking ensemble model can be constructed by merging different models. eXtreme Gradient Boosting (XGBoost)¹³, support vector machine (SVM)¹⁴, and random forest (RF)¹⁵ were chosen as base models. Prediction of the ensemble model was decided by simple majority voting from the three models.⁹ Data augmentation and ML model development were conducted using R software (v3.5.1) and the parameters and libraries are summarized (Table 2A, B). SVM regularization parameters were optimized by a grid search.

Evaluation metrics: Data augmentation and ensemble learning were evaluated by 5-fold cross validation. Data augmentation was only incorporated to training dataset. ML models were trained from the augmented dataset. Intact test dataset was used to evaluate model performance.
To compare efficacy of SMOTE and ROSE, accuracy of the ensemble model was assessed by changing augmentation (increment) ratio from 0% to 500%. To compare the ensemble model and base models using best augmentation method which was decided by above assessment, accuracy, sensitivity, specificity, F1 score, and area under the curve of receiver operation characteristics (ROC) (AUC) were assessed. The definition of each metrics was summarized (Table 2C). A paired t-test was used and a P- value less than 0.05 was considered significant.

RESULTS

Comparison of SMOTE and ROSE is shown in Table 3. The overall accuracy for SMOTE to differentiate low from high grade PCa in PZ was higher than ROSE. Accuracy for ensemble model using SMOTE was not improved at augmentation ratio more than 100%. Accuracy was not improved by ROSE. The best accuracy of 78.8% using SMOTE was obtained at augmentation ratio of 50%.
Evaluation of the ensemble model compared to base models was summarized in Table 4. Without SMOTE, ensemble model showed higher AUC and comparable performance in other evaluation metrics to base models. Using 50% augmentation SMOTE, ensemble model showed overall higher performance than base models, except for sensitivity of SVM and specificity of XGBoost. Accuracy, sensitivity, specificity, F1 score, and AUC for the ensemble model using SMOTE were significantly higher than those without SMOTE (p=0.00332, 0.44, 0.00147, 0.0425, and 0.00544, respectively).

CONCLUSION

We demonstrated the feasibility of SMOTE data augmentation with ensemble learning technique on mp-MRI for differentiating low from high grade PCa. The results indicate that the proposed method is an effective strategy to improve prediction accuracy for an imbalanced small dataset.

Acknowledgements

No acknowledgement found.

References

1. Epstein JI et al. The 2014 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma: Definition of Grading Patterns and Proposal for a New Grading System. Am J Surg Pathol. 2016:40:244–252

2. Rooij M et al. Accuracy of multiparametric MRI for prostate cancer detection : A meta-analysis. Am J Roentgenol. 2014;202 :343-351

3. Shah V et al. Decision support system for localizing prostate cancer based on multiparametric magnetic resonance imaging. Med Phys. 2012;39:4093-4103

4. Razzak MI et al. Deep Learning for Medical Image Processing: Overview, Challenges and Future. In Classification in BioApps. Springer. 2018:323–350

5. He H and Garcia E. Learning from imbalanced data. IEEE Trans Knowledge Data Eng. 2009:21: 1263-1284

6. Chawla N et al. SMOTE : Synthetic Minority Over-Sampling Technique. J. Artifcial Intelligence Research. 2012:16:321-357

7. Menardi G and Torelli N. Training and assessing classification rules with imbalanced data. Data Min Knowl Disc. 2014:28:92–122

8. Chaudhury B et al Identifying metastatic breast tumors using textural kinetic features of a contrast based habitat in DCE-MRI. Proc. SPIE. 2015:9414:941415

9. Rokach L. Ensemble-based classifiers. Artificial Intelligence Review. 2010:33:1-2

10. Iftikhar MA and Idris A. An Ensemble Classification Approach for Automated Diagnosis of Alzheimer’s Disease and Mild Cognitive Impairment. In: 2016 international conference on open source systems and technologies (ICOSST), pp 78–83

11. Ueda Y et al. Triexponential function analysis of diffusion-weighted MRI for diagnosing prostate cancer. JMRI. 2016;43:138-146

12. Akamine Y et al. Application of hierarchical clustering to multi-parametric MR in prostate: Differentiation of tumor and normal tissue with high accuracy. In:Proc 27th Annual Meeting of ISMRM, Montreal 2019;1615

13. Chen T and Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016:785–794

14. Vapnik V.N. The Nature of Statistical Learning Theory. New York: Springer-Verlag.1995

15. Liaw A and Wiener M. Classification and Regression by randomForest. R News, 2002, 2/3:18–22

Figures

Fig. 1. Schema of synthetic-minority-over-sampling-technique (SMOTE) algorism. (A) SMOTE algorithm generates synthetic examples through a linear interpolation between two existing minority examples. (B) For each minority sample, depending on the amount of synthetic examples, neighbors from the k nearest neighbors are randomly chosen. In this study, we used five nearest neighbors.

Table 1. (A) Sequence parameters of axial T2W, DWI, and DCE-DWI. (B) Parameters and analysis methods for IVIM, DKI, and permeability.

Table 2. (A) Machine learning parameters of the base models build for the ensemble learning model. Parameters and values for eXtreme Gradient Boosting (XGBoost), support vector machine (SVM), and random forest (RF) are shown. (B) Libraries in R software used for the data augmentation and the ML models. (C) The definition of each evaluation metrics for the data augmentation methods and the ensemble model.

Table 3. Comparison of data augmentation methods between SMOTE and random-over-sampling-examples (ROSE) algorism. Accuracy for the ensemble model to differentiate low from high grade PCa in PZ using SMOTE or ROSE are shown. Augmentation (increment) ratio was investigated to 0% to 500%.

Table 4. Performance of proposed method combining SMOTE and ensemble model, compared to base models. Accuracy, sensitivity, specificity, F1 score, and AUC are evaluated. P value between ensemble model without SMOTE and with SMOTE is shown. A paired t-test was used and a P- value less than 0.05 was considered significant.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

2390