A combinatorial model approach for feature selection from multimodal MRI data
Xiaowei Zhuang1, Virendra Mishra1, Karthik Sreenivasan1, Charles Bernick1, Sarah Banks1, and Dietmar Cordes1,2

1Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States, 2Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States

Synopsis

Clinical applications of brain abnormality detection with supervised machine learning techniques are limited due to less and unbalanced sample sizes as compared to rich feature sets in patient population. We proposed a new combinatorial model approach, fs-RBFN, involving sampling from multivariate joint distribution, LASSO feature selection, RBFN cross validation, and inverse probability weighting to solve this problem. The proposed approach was validated against a ground truth phantom and further tested on a multimodal MRI dataset for cognitively impaired and non-impaired professional fighters. Our results suggest superior performance of this technique over several other out-of-the-bag feature selection algorithms.

Purpose

Supervised classification algorithms have been widely applied to extract clinically useful features from magnetic resonance imaging (MRI) data1, 2. The clinical application of such automatic abnormality detection algorithms is, however, limited due to less and unbalanced sample size as compared to rich feature set. We propose a combinatorial model approach, fs-RBFN (feature selection radial basis function network3), to extract clinically relevant features from a multimodal MRI dataset with a limited and unbalanced sample size. We further demonstrate better performance in our cohort of professional fighters from Professional Fighters’ Brain Health Study (PFBHS) data4 which was collected at our center with the proposed approach.

Methods

Model for feature selection: Multivariate joint distribution for each subgroup of population was first estimated separately and equal number of subjects within each group were then resampled from this distribution5. Least absolute shrinkage and selection operator6 (LASSO) in logistic regression was applied to the resampled dataset for feature selection. With different regulatory strength ($$$\lambda$$$) of the L1 norm, different subsets of features were acquired. The original subgroup was then divided into training (70%) and testing (30%) datasets and the non-linear model of RBFN was then used in the training sample to get the best subset of features (See Fig.1). Inverse probability weighting7 (IPW) was used in the RBFN model to balance the group contribution with different sample sizes. Finally, the RBFN classifier with the best subset of features was applied on the independent testing dataset to calculate prediction accuracy (PA), sensitivity (Se), specificity (Sp) and area under the receiver operating characteristic curve (AUC). Model Validation with simulation: Two categories in the Iris dataset8 with 80 subjects (50/30) and 4 true features were used in the simulation. Additional 196 random features were generated for each subject and Gaussian random noise ($$$\mu = 1, \sigma = 1$$$) was also added to the entire feature set. We then tested fs-RBFN approach on this simulated dataset with known ground truth. Clinical Validation: In PFBHS project, 305 professional fighters were divided into impaired (99) and non-impaired (306) groups based on clinical cognitive impairment scores. Total 315 features for each fighter including 113 structural measurements (cortical thickness and volumes) extracted from T1-MPRAGE scans, 196 diffusion measurements (FA, AD, RD and MD) of major white matter tracts9 extracted from diffusion weighted images (71 directions, b-value 1000 s/mm2), gender, age, years of education, years of professional fighting and type of fighters were input into this fs-RBFN model for supervised feature classification. All scans were collected on a 3T Siemens Verio. Performance of this fs-RBFN model approach was also compared with various widely accepted machine learning techniques including LASSO, SVM with radial basis function kernel10, random forest11 and gradient boosting method12.

Results

Fig.1 illustrates the proposed fs-RBFN approach. Simulation studies with the fs-RBFN model yielded an AUC of 0.95 (PA = 93.75%, Se = 90%, Sp = 100%) on testing dataset when 5/200 features (3/4 true features) were selected. In the clinical validation, the proposed model yielded an AUC of 0.724 (PA = 70%, Se = 65.85%, Sp = 78.95%) on the independent testing dataset when only 6/315 features were extracted, which were FMajor FA, ILF left FA, left thalamus volume, left medialorbito-frontal thickness, left cerebellum white matter volume and left lateralorbito-frontal thickness (Fig. 2A). fs-RBFN showed a higher AUC and much less relevant features when compared to several other existing supervised feature selection techniques, indicating superior performance (Fig. 2B).

Conclusion

A novel supervised clustering technique, fs-RBFN, is proposed by combining LASSO feature selection and RBFN non-linear fitting cross validation. We further propose sampling from multivariate joint distribution and useinverse probability weighting to resolve concerns of small and unbalanced sample size. The proposed approach is validated against a ground truth phantom and further tested on a multimodal MRI dataset for impaired and non-impaired professional fighters. Our results suggest superior performance over several other out-of-the-bag feature selection algorithms.

Acknowledgements

This research was supported by NIH (grant number: 7R01EB014284).

References

1. Mohsen H et al., 2012, Informatics and Systems. 2. Singh L et al., 2012, Springer Berlin Heidelberg. 3. Haykin S., 2009, Pearson Education, Inc. 4. Bernick C et al., 2013, Am J Epidemiol. 5. Hernádvölgyi IT., 1998, University of Ottawa, Canada, Ottawa, ON. 6. Tibshirani R., 1996, Journal of the Royal Statistical Society. 7. Robins JM et al., 1994, Journal of the American Statistical Association. 8. Fisher RA., 1936, Annals of Eugenics. 9. Wakana S et al., 2004, Radiology. 10. Cortes C et al., 1995, Mach Learn. 11. Breiman L., 2001, Mach Learn. 12. Friedman JH., 2001, Annals of Statistics.

Figures

Figure 1. Flow chart of proposed fs-RBFN approach.

Figure 2. A. Six features found to best distinguish cognitively impaired and non-impaired fighters with fs-RBFN model, overlaid on MNI template. B. Comparison of the fs-RBFN performance with various out-of-the-bag machine learning techniques.



Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)
4316