Nonlinear kernel canonical correlation analysis (kCCA) in fMRI

Zhengshi Yang¹, Xiaowei Zhuang¹, Tim Curran², and Dietmar Cordes^1,3

¹Cleveland Clinic Lou Ruvo Center for Brain Health, Las vegas, NV, United States, ²Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, United States, ³Department of Radiology, University of Colorado-Denver, Denver, CO, United States

Synopsis

Kernel representation is an efficient method to extract nonlinear features without significantly increased computational complexity. Linear kernel CCA has been applied to fMRI data but the performance of nonlinear kernel CCA is still not clear. Here we investigate the accuracy of five types of kernel on simulated fMRI data and then apply Gaussian kernel CCA on real fMRI data. It provides a more sensitive and specific way to detect activation pattern.

Introduction

Lower order/dimensional representations of fMRI data cannot capture the nonlinear features which may potentially introduce bias¹ in our understanding of the activation. On the contrary, high-dimensional representations using nonlinear mapping of the fMRI data can be advantageous to extract nonlinear features of the data². Such a high-dimensional representation can be obtained by mapping the lower dimensional data space into a high-dimensional feature space utilizing kernels^3,4. Working in a high-dimensional space with specified features increases the computational complexity significantly. Kernel representation provides an alternative learning to nonlinear functions³ without the features being specified. Instead of using nonlinear optimization routines, nonlinear CCA problem can be solved efficiently combining kernel and linear algebra ^1,4,5. This study investigates 5 types of kernels: Gaussian, linear, hyperbolic tan (tanh), parabolic and inverse kernels in kernel CCA (kCCA) on simulated fMRI data and compares activation pattern in an fMRI episodic memory task by using Gaussian kCCA and general linear model (GLM) analysis.

Methods

Two fMRI data sets were acquired on a 3.0T GE HDx MRI scanner equipped with an 8-channel head coil and parallel imaging acquisition using parameters: ASSET=2, ramp sampling, TR/TE=2sec/30ms, FA=70deg, FOV=22cmx22cm, thickness/gap=4mm/1mm, 25 slices, in-plane resolution 96x96 interpolated to 128x128. One was collected during rest-state and the other one was collected during episodic memory task. Simulated fMRI data was generated on a 32x32 grid and activation pattern for each 3x3 local neighborhood with an active center determined by the empirical distribution of the real data by applying mass-univariate analysis. Simulation was formed by adding wavelet resampled rest-state to the true activated time course (p<10^-6 uncorrected), with a varying noise ratio range from 0 to 1 with step 0.01. In simulated data, let X be the matrix representing voxel time courses with dimensions q x t (q = 1024 for 32x32 grid) and Y (q x 1) is the classifier representing each voxel is either active (y = 1) or inactive (y = 0). The data for first 512 voxels is set as training set X_train, Y_train and the left is testing set X_test, Y_test. Five types of kernels used are:

$k(u,v)=e^{-\frac{||u-v||^2}{2\sigma^2}}$ ,

$k(u,v)=<u,v>$ ,

$k(u,v)=tanh(\sigma<u,v>+1)$ ,

$k(u,v)=(<u,v>+\sigma)^2$ ,

$k(u,v)=\frac{1}{\sqrt{||u-v||^2+\sigma^2}}$ with σ=10 in our study. Because Y is a binary classifier, there is no intrinsic difference to construct

$K_{YY}$ with these kernels and is calculated by linear kernel in all cases. The function

$\rho=\frac{\alpha'X'Y\beta}{\sqrt{\alpha'X'X\alpha\beta'Y'Y\beta}}$ in conventional CCA can be rewritten as

$\rho=\frac{\omega_x'K_{XX}K_{YY}\omega_y}{\sqrt{\omega_x'K_{XX}K_{XX}\omega_x\omega_y'K_{YY}K_{YY}\omega_y}}$ by denoting

$\alpha=X'\omega_x$ ,

$\beta=Y'\omega_y$ ,

$K_{XX}=XX'$ and

$K_{YY}=YY'$ . In each kernel CCA,

$K_{XX}$ is constructed respectively and the directions

$\omega_x$ and

$\omega_y$ are found by maximizing ρ with regularization⁴

$k=0.25$ added in the training set. This optimization problem can be converted to maximize the numerator subject to

$\omega_x'K_{XX}^2\omega_x+k\omega_x'K_{XX}\omega_x=1$ and

$\omega_y'K_{YY}^2\omega_y+k\omega_y'K_{YY}\omega_y=1$ and finally reformulated to a standard eigenproblem⁴

$(K_{XX}+kI)^{-1}K_{YY}(K_{YY}+kI)^{-1}K_{XX}\omega_x=\rho^2\omega_x$ . Since the feature

$X\alpha$ from one view of the data should be identical to the feature

$Y\beta$ , namely

$X_{test}\alpha\approx Y_{test}^{est}\beta$ , it is reformulated as

$K_{X_{test}X_{train}}\omega_x\approx Y_{test}^{est}Y_{train}'\omega_y$ and then

$Y_{test}^{est}=K_{X_{test}X_{train}}\omega_x(Y_{train}\omega_y)^{-1}$ .⁵ The whole algorithm is repeated for every noise ratio. Receiver operating characteristic (ROC) is used to evaluate

$Y_{test}^{est}$ in all these kCCA and the area under ROC curve (AUC) is measured within the range of false positive rate (FPR) from 0 to 0.1. Gaussian kCCA and GLM analysis was used on the real data. The statistic threshold is acquired by applying the same method on wavelet resampled resting state data a hundred times to get the distribution and find the significance.

Results

Fig.1 illustrates AUC for different noise ratios in kCCA analysis with all the five kernels. Tanh (grey) kernel has the lowest performance with an arbitrary noise ratio. Gaussian (red) and inverse (purple) kernel have significantly improved activation detection before noise ratio reaches 0.84 and parabolic (yellow) kernel performs worse than linear (blue) kernel after noise ratio reaches 0.65. Overall, Gaussian kernel shows most accurate activation detection. Fig.2 is the activation pattern in memory task for the contrast encoding-distraction at a p-value of 10^-3 with cluster size greater than 20. As can be observed Gaussian kCCA (Fig.2A) shows much stronger activation pattern than GLM analysis (Fig.2B).

Acknowledgements

This work is partially supported by the NIH (7R01EB014284)

References

[1] Shotaro A, 2007, arXiv [2] Bernhard S, 2002, MIT Press [3] Nello Cet al., 2000, Cambridge University Press [4] David H et al., 2004, Neural Computation [5] David H et al., 2007, Neuroimage

Figures

Figure 1: AUC value for Gaussian, linear, tanh, parabolic and inverse kCCA against noise ratio.

Figure 2: Activation pattern by using Gaussian kernel CCA with

$\sigma=10$ and

$k = 0.25$ and GLM analysis at a p-value of 10^-3 with cluster size greater than 20.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

3813