Zhengshi Yang1, Xiaowei Zhuang1, Tim Curran2, and Dietmar Cordes1,3
1Cleveland Clinic Lou Ruvo Center for Brain Health, Las vegas, NV, United States, 2Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States, 3Department of Radiology, University of Colorado-Denver, Denver, CO, United States
Synopsis
Kernel representation is an efficient method to extract nonlinear features without significantly increased computational complexity. Linear kernel CCA has been applied to fMRI data but the performance of nonlinear kernel CCA is still not clear. Here we investigate the accuracy of five types of kernel on simulated fMRI data and then apply Gaussian kernel CCA on real fMRI data. It provides a more sensitive and specific way to detect activation pattern.Introduction
Lower
order/dimensional representations of fMRI data cannot capture the nonlinear
features which may potentially introduce bias
1 in our understanding
of the activation. On the contrary, high-dimensional representations using
nonlinear mapping of the fMRI data can be advantageous to extract nonlinear
features of the data
2. Such a high-dimensional representation can be
obtained by mapping the lower dimensional data space into a high-dimensional feature
space utilizing kernels
3,4. Working in a high-dimensional space with
specified features increases the computational complexity significantly. Kernel
representation provides an alternative learning to nonlinear functions
3 without
the features being specified. Instead
of using nonlinear optimization routines, nonlinear CCA problem can be solved
efficiently combining kernel and linear algebra
1,4,5. This study
investigates 5 types of kernels: Gaussian, linear, hyperbolic tan (tanh),
parabolic and inverse kernels in kernel CCA (kCCA) on simulated fMRI data and
compares activation pattern in an fMRI episodic memory task by using Gaussian
kCCA and general linear model (GLM) analysis.
Methods
Two
fMRI data sets were acquired on a 3.0T GE HDx MRI scanner equipped with an
8-channel head coil and parallel imaging acquisition using parameters: ASSET=2,
ramp sampling, TR/TE=2sec/30ms, FA=70deg, FOV=22cmx22cm, thickness/gap=4mm/1mm,
25 slices, in-plane resolution 96x96 interpolated to 128x128. One was collected
during rest-state and the other one was collected during episodic memory task. Simulated
fMRI data was generated on a 32x32 grid and activation pattern for each 3x3
local neighborhood with an active center determined by the empirical
distribution of the real data by applying mass-univariate analysis. Simulation was formed by adding wavelet resampled rest-state to the
true activated time course (p<10
-6 uncorrected), with a varying noise ratio
range from 0 to 1 with step 0.01. In simulated
data, let
X be the matrix
representing
voxel time courses with dimensions
q x
t (
q =
1024 for 32x32 grid) and
Y (
q x 1) is
the classifier representing each voxel is either active (y = 1) or
inactive (y = 0). The data for first 512 voxels is set as training set
Xtrain,
Ytrain and the left is testing set
Xtest,
Ytest. Five
types of kernels used are: $$$k(u,v)=e^{-\frac{||u-v||^2}{2\sigma^2}}$$$, $$$k(u,v)=<u,v>$$$, $$$k(u,v)=tanh(\sigma<u,v>+1)$$$, $$$k(u,v)=(<u,v>+\sigma)^2$$$, $$$k(u,v)=\frac{1}{\sqrt{||u-v||^2+\sigma^2}}$$$ with σ=10 in our study. Because
Y is a binary classifier, there is no intrinsic
difference to construct $$$K_{YY}$$$ with these kernels and
is calculated by linear kernel in all cases. The function $$$\rho=\frac{\alpha'X'Y\beta}{\sqrt{\alpha'X'X\alpha\beta'Y'Y\beta}}$$$ in conventional CCA can be rewritten as $$$\rho=\frac{\omega_x'K_{XX}K_{YY}\omega_y}{\sqrt{\omega_x'K_{XX}K_{XX}\omega_x\omega_y'K_{YY}K_{YY}\omega_y}}$$$ by denoting $$$\alpha=X'\omega_x$$$, $$$\beta=Y'\omega_y$$$, $$$K_{XX}=XX'$$$ and $$$K_{YY}=YY'$$$. In each kernel CCA, $$$K_{XX}$$$ is constructed respectively and the directions $$$\omega_x$$$ and $$$\omega_y$$$ are found by maximizing ρ with regularization
4 $$$k=0.25$$$ added in the training set. This
optimization problem can be converted to maximize the numerator subject to $$$\omega_x'K_{XX}^2\omega_x+k\omega_x'K_{XX}\omega_x=1$$$ and $$$\omega_y'K_{YY}^2\omega_y+k\omega_y'K_{YY}\omega_y=1$$$ and finally reformulated to a standard eigenproblem
4 $$$(K_{XX}+kI)^{-1}K_{YY}(K_{YY}+kI)^{-1}K_{XX}\omega_x=\rho^2\omega_x$$$. Since the feature $$$X\alpha$$$ from one view of the data should be identical to the feature $$$Y\beta$$$, namely $$$X_{test}\alpha\approx Y_{test}^{est}\beta$$$, it is reformulated as $$$K_{X_{test}X_{train}}\omega_x\approx Y_{test}^{est}Y_{train}'\omega_y$$$ and then $$$Y_{test}^{est}=K_{X_{test}X_{train}}\omega_x(Y_{train}\omega_y)^{-1}$$$.
5 The
whole algorithm is repeated for every noise ratio. Receiver operating
characteristic (ROC) is used to evaluate $$$Y_{test}^{est}$$$ in all these kCCA and the area under ROC curve (AUC) is measured
within the range of false positive rate (FPR) from 0 to 0.1. Gaussian kCCA and
GLM analysis was used on the real data. The statistic threshold is acquired by
applying the same method on wavelet resampled resting state data a hundred
times to get the distribution and find the significance.
Results
Fig.1
illustrates AUC for different noise ratios in kCCA analysis with all the five kernels. Tanh (grey) kernel has the lowest performance with an arbitrary noise ratio. Gaussian (red) and
inverse (purple) kernel have significantly improved activation detection before noise
ratio reaches 0.84 and parabolic (yellow) kernel performs worse than linear (blue) kernel after
noise ratio reaches 0.65. Overall, Gaussian kernel shows most accurate
activation detection. Fig.2 is the activation pattern in memory task for the
contrast encoding-distraction at a p-value of 10
-3 with cluster size greater than 20. As can be observed Gaussian kCCA (Fig.2A) shows much stronger activation pattern than GLM analysis (Fig.2B).
Acknowledgements
This work is partially supported by the NIH (7R01EB014284)References
[1] Shotaro A, 2007, arXiv [2] Bernhard S, 2002, MIT Press [3] Nello Cet al., 2000, Cambridge University Press [4] David H et al., 2004, Neural Computation [5] David H et al., 2007, Neuroimage