Xiaowei Zhuang1, Virendra Mishra1, Karthik Sreenivasan1, Charles Bernick1, Sarah Banks1, and Dietmar Cordes1,2
1Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States, 2Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States
Synopsis
Clinical
applications of brain abnormality detection with supervised machine learning
techniques are limited due to less and
unbalanced sample sizes as compared to rich feature sets
in patient population. We proposed a new combinatorial model approach, fs-RBFN,
involving sampling from multivariate joint distribution, LASSO feature
selection, RBFN cross validation, and inverse probability weighting to solve
this problem. The proposed approach was validated against a ground truth
phantom and further tested on a multimodal MRI dataset for cognitively impaired
and non-impaired professional fighters. Our results suggest superior
performance of this technique over several other out-of-the-bag feature
selection algorithms.Purpose
Supervised classification algorithms have been
widely applied to extract clinically useful features from magnetic resonance
imaging (MRI) data
1, 2. The clinical application of such automatic
abnormality detection algorithms is, however, limited due to less and
unbalanced sample size as compared to rich feature set. We propose a
combinatorial model approach, fs-RBFN (feature selection radial basis function
network
3), to extract clinically relevant features from a multimodal
MRI dataset with a limited and unbalanced sample size. We further demonstrate
better performance in our cohort of professional fighters from Professional
Fighters’ Brain Health Study (PFBHS) data
4 which was collected at
our center with the proposed approach.
Methods
Model for feature selection: Multivariate joint
distribution for each subgroup of population was first estimated separately and
equal number of subjects within each group were then resampled from this
distribution
5. Least absolute shrinkage and selection operator
6
(LASSO) in logistic regression was applied to the resampled dataset for feature
selection. With different regulatory strength ($$$\lambda$$$) of the L1 norm, different subsets of features
were acquired. The original subgroup was then divided into training (70%) and
testing (30%) datasets and the non-linear model of RBFN was then used in the
training sample to get the best subset of features (See Fig.1). Inverse
probability weighting
7 (IPW) was used in the RBFN model to balance
the group contribution with different sample sizes. Finally, the RBFN
classifier with the best subset of features was applied on the independent
testing dataset to calculate prediction accuracy (PA), sensitivity (Se), specificity
(Sp) and area under the receiver operating characteristic curve (AUC).
Model Validation with simulation: Two
categories in the Iris dataset
8 with 80 subjects (50/30) and 4 true
features were used in the simulation. Additional 196 random features were
generated for each subject and Gaussian random noise ($$$\mu = 1, \sigma = 1$$$) was also added to the entire feature set. We
then tested fs-RBFN approach on this simulated dataset with known ground truth.
Clinical Validation: In PFBHS
project, 305 professional fighters were divided into impaired (99) and
non-impaired (306) groups based on clinical cognitive impairment scores. Total
315 features for each fighter including 113 structural measurements (cortical
thickness and volumes) extracted from T1-MPRAGE scans, 196 diffusion
measurements (FA, AD, RD and MD) of major white matter tracts
9
extracted from diffusion weighted images (71 directions, b-value 1000 s/mm2),
gender, age, years of education, years of professional fighting and type of
fighters were input into this fs-RBFN model for supervised feature
classification. All scans were collected on a 3T Siemens Verio. Performance of
this fs-RBFN model approach was also compared with various widely accepted
machine learning techniques including LASSO, SVM with radial basis function
kernel
10, random forest
11 and gradient boosting method
12.
Results
Fig.1 illustrates the proposed fs-RBFN
approach. Simulation studies with the
fs-RBFN model yielded an AUC of 0.95 (PA = 93.75%, Se = 90%, Sp = 100%) on
testing dataset when 5/200 features (3/4 true features) were selected. In the
clinical validation, the proposed model yielded an AUC of 0.724 (PA = 70%, Se =
65.85%, Sp = 78.95%) on the independent testing dataset when only 6/315 features
were extracted, which were FMajor FA, ILF left FA, left thalamus volume, left
medialorbito-frontal thickness, left cerebellum white matter volume and left
lateralorbito-frontal thickness (Fig. 2A). fs-RBFN showed a higher AUC and much
less relevant features when compared to several other existing supervised
feature selection techniques, indicating superior performance (Fig. 2B).
Conclusion
A novel supervised clustering technique, fs-RBFN, is
proposed by combining LASSO feature selection and RBFN non-linear fitting cross
validation. We further propose sampling from multivariate joint distribution
and useinverse probability weighting to resolve concerns of small and
unbalanced sample size. The proposed approach is validated against a ground
truth phantom and further tested on a multimodal MRI dataset for impaired and
non-impaired professional fighters. Our results suggest superior performance
over several other out-of-the-bag feature selection algorithms.
Acknowledgements
This research was supported by NIH (grant number: 7R01EB014284).References
1. Mohsen H et al., 2012, Informatics and
Systems. 2. Singh L et al., 2012, Springer Berlin Heidelberg. 3. Haykin S.,
2009, Pearson Education, Inc. 4. Bernick C et al., 2013, Am J Epidemiol. 5. Hernádvölgyi IT., 1998, University
of Ottawa, Canada, Ottawa, ON. 6. Tibshirani R., 1996, Journal of the Royal
Statistical Society. 7. Robins JM et al., 1994, Journal of the American Statistical
Association. 8. Fisher RA., 1936, Annals of Eugenics. 9. Wakana S et al., 2004,
Radiology. 10. Cortes C et al., 1995, Mach Learn. 11. Breiman L., 2001, Mach Learn. 12. Friedman
JH., 2001, Annals of Statistics.