To investigate the approach of classification and prediction methods using the machine learning (ML)-based optimized combination-feature (OCF) set on gray matter volume (GMV) and QSM in elderly subjects with a cognitive normal (CN) profile, those with amnestic MCI (aMCI), and mild and moderate AD patients, GMV and QSM in the brain were calculated. To differentiate the three subject groups, the support vector machine (SVM) with the three different kernels and with the OCF set was conducted with GMV and QSM values. To predict the aMCI stage, regression-based ML models were developed with the OCF set.
Introduction
Both, principle component analysis (PCA)- and the linear discriminate analysis (LDA) machine learning methods, were used to classify the subject between CN and MCI (1,2). The main disadvantage of the PCA- or LDA-based machine learning approach is the difficulty in determining the factors affecting the results. Considering that both the PCA- and LDA-based machine learning methods feature a reduction method to minimize the course of dimensionality, a limitation in the recognition or interpretation for the evaluation of AD stages exists. The objective of this study was to investigate the approach of classification and prediction methods using the machine learning (ML)-based optimized combination-feature (OCF) set on gray matter volume (GMV) and quantitative susceptibility mapping (QSM) in elderly subjects with a cognitive normal (CN) profile, those with amnestic mild cognitive impairment (aMCI), and mild and moderate Alzheimer’s disease (AD) patients. Furthermore, to evaluate the differences among the CN, aMCI, and AD groups, the support vector machine (SVM) classification was conducted with GMV and QSM values using machine learning.Methods
The institutional review board at our institution approved this prospective study protocol, whereas informed consent requirements were waived. All participants provided a detailed medical history and underwent both the full Seoul Neuropsychological Screening Battery (SNSB). MRI scans were acquired for each subject using a 3-T MR system (Achieva, Philips Medical Systems, Best, The Netherlands). GMV and QSM in the brain were calculated from isotropic 3D T1-weighted images (MPRAGE sequence) and 3D multi-echo gradient-echo (fast field-echo sequence) images, respectively, in 19 CN subjects, 19 aMCI subjects, and 19 AD patients (3). To generate a GMV map, the following processing steps were performed using the Statistical Parametric Mapping Version 8 (SPM8) program (Wellcome Department of Imaging Neuroscience, University College, London, UK). To generate the QSM, both the acquired magnitude and phase images from the 3D FFE sequence were further processed by implementing the Morphology Enabled Dipole Inversion (MEDI) method (4). Regions-of-interest (ROIs) were defined as the regions well-known for high iron content and amyloid accumulation areas in the AD brain. A total of 24 features were extracted from the ROIs of the GMV and QSM data, while 12 ROI values were selected. The optimized combination-feature (OCF) set was used to explain and interpret the characteristics which may have affected the results. To differentiate the three subject groups with the OCF set, the SVM kernel classifiers with three different kernels, namely linear (1st polynomial), quadratic (2nd polynomial), and cubic (3rd polynomial), were conducted with GMV and QSM values. The predictive analysis was performed on those OCF sets that were meaningful in the classification between the aMCI stage and the CN profile. In the present study, the following three regression models for prediction were used: the rational quadratic regression (RQ), squared exponential (SE), and exponential (EXP) GPR models. To predict the aMCI stage, regression-based ML models were developed with the OCF set. The regression performance was assessed using the root mean square error (RMSE). The prediction result was compared with the clinical data accuracy.Results
In the group classification between CN and aMCI subjects, the highest accuracy was shown for the combination of GMVs (the hippocampus and the entorhinal cortex) and QSMs (the hippocampus and the pulvinar) data using the 2nd SVM classifier (AUC = 0.94). Furthermore, in the group classification between aMCI and AD patients, the highest accuracy was shown using the combination of GMVs (amygdala, entorhinal cortex, and posterior cingulate cortex) and QSMs (hippocampus and pulvinar) data using the 2nd SVM classifier (AUC = 0.93). Finally, in the group classification of CN and AD subjects, the highest accuracy was shown using the combination of GMVs (amygdala, entorhinal cortex, and posterior cingulate cortex) and QSMs (hippocampus and pulvinar) data using the 2nd SVM classifier (AUC = 0.99). To distinguish aMCI from CN, the exponential Gaussian process regression model with the OCF set using GMV and QSM data showed results most similar (RMSE = 0.371) to those obtained using clinical data (RMSE = 0.319).Conclusion
The ML-based OCF setting technique with GMVs (the hippocampus and the entorhinal cortex) and QSMs (the hippocampus and the pulvinar) was shown to effectively classify the subject group and predict the aMCI stage, indicating that the OCF set with brain tissue volume and susceptibility is a method for the classification and prediction of the early AD stage and can be used for personalized analysis or as a diagnostic aid program.