1755

Beyond Differences: Cross-Subject and Cross-Dataset fMRI Brain Decoding of visual stimuli

Matteo Ferrante¹, Tommaso Boccato², Furkan Ozcelik³, Rufin VanRullen⁴, Rufin VanRullen⁴, and Nicola Toschi²
¹Biomedicine and prevention, University of Rome Tor Vergata, Rome, Italy, ²University of Rome Tor Vergata, Rome, Italy, ³CerCo, University of Toulouse III Paul Sabatier, Toulouse, France, ⁴CNRS, CerCo, ANITI, TMBI, Univ. Toulouse, Toulouse, France

Synopsis

Keywords: AI Diffusion Models, fMRI (task based), brain decoding, fMRI

Motivation: Brain decoding has been limited by the need for large data amounts and subject-specific methodologies. Current techniques require extensive scanning, which is costly and time-consuming, restricting their applicability.

Goal(s): The study aims to establish a novel, more efficient approach for cross-subject brain decoding of visual stimuli.

Approach: Using the NSD we applied regularized ridge regression to align brain activity across different subjects on common stimuli representations, employing the state-of-the art Brain-Diffuser pipeline for decoding and image reconstruction.

Results: The ridge regression alignment method surpassed others, enabling consistent cross-subject decoding with significantly reduced data—demonstrating feasibility and a potential 90% scan time reduction.

Impact: A reliable technique for cross-subject, -scanner and -field strength alignment can pave the way for efficient brain decoding without the need for extensive data collection and/or ultra-high field strengths.

Introduction

Brain decoding, a cornerstone of modern neuroscience, seeks to decipher the intricate neural patterns underpinning cognitive functions. Within this domain, Functional magnetic resonance (fMRI) has proven invaluable, especially for decoding visual stimuli [1,2,3,4]. By associating neural patterns with the latent space of deep learning models, several works have shown the value of using fMRI to predict or reconstruct visual experiences based on neural responses exclusively. However, current methodologies are often tailored to individual subjects and require very large data amounts, resulting in prolonged (>24 h) and costly scanning. This subject-centric and data-intensive approach has severely restricted the broader applicability of brain decoding. In this context, so called functional data alignment techniques, aimed at using models trained on one subject to decode other subjects’s data, are being developed. Here, we propose a novel alignment approach which sets new benchmarks in cross-subject brain decoding for visual stimuli. Our approach is scalable and can reduce the amount of data needed by as much as 90%, paving the way for broader applicability across varied fields and subjects. We also demonstrate cross-dataset decoding showing how visual stimuli can be reconstructed across different subjects, magnetic fields and scanners.

Methods

We leverage the Natural Scenes Dataset (NSD) [4], comprising fMRI data from four subjects exposed to 10,000 natural images. 1000 of these images were in common across subjects, and were used to devise our functional alignment procedure. The subjects participated in several 7T fMRI scanning sessions (TR=1.6s, 1.8mm isotropic voxel), where distinct natural images from the COCO dataset were presented for 2 seconds each (1 second interval). GLMsingle [5] was used to extracted task-related voxel-wise activations, also resulting in visual cortex masks (~14,000 voxels/subject). In our approach, different subject-wise activations in response to the same stimulus are aligned using linear regression model with L2 regularization. As baselines, we employed 1) anatomical alignment through T1-based coregistration, and 2) functional hyperalignment: which optimally aligns local activity patterns across subjects, designating one as a "template." We employed the Brain-Diffuser decoding pipeline as decoder, and trained it to decode visual stimuli on NSD Subj01 exclusively, followed by the decoding of aligned activity of other subjects. This pipeline linearly projects brain activity into the latent space of pretrained models like CLIP and VD-VAE, subsequently using VersatileDiffusion for image reconstruction from neural activity. We benchmarked against other alignment methods, also comparing shared data proportion, using both qualitative and quantitative metrics like PixCorr, SSIM and CLIP 2-way accuracy We also conducted a cross-dataset decoding experiment using the BOLD5000 [6] dataset, which differs in acquisition protocol from NSD but contains some common images used as stimuli. The experiment centered on BOLD5000's CSI1 subject, sharing 1,000 images with NSD subjects, aiming for cross-dataset decoding. Results, both qualitative and quantitative, were presented to evaluate decoding quality across datasets.

Results

The study's results indicated that the Ridge Regression-based alignment method outperformed other methods, especially when using a fraction of shared data between subjects. We demonstrated that the alignment of brain activity across-subjects is feasible and that is possible to achieve the same qualitative and quantitative performances of within-subject decoding using 1000 images, hence reducing scan time by 90%. This was evident in the qualitative nature of the decoded images, which remained consistent irrespective of the subjects chosen for training and alignment. The images correctly and consistently reproduced high-level content and foundational shapes across varying subject combinations. Moreover, we show experimentally that cross-field, cross-machine and cross-paradigm decoding in feasible

Discussion

This study highlights the potential of a simple alignment method (ridge regression) for streamlining the brain decoding process, which performs better than all other available linear and nonlinear methods. This removes the need to repeat most of the experiment when a new subject is introduced rendering brain decoding much more affordable in terms of time and resources. We also showed that anatomical alignment, which rely on mathching brain structure for alignment and decoding, underperforms due to the inherent brain anatomical variability across individuals. This variability may not correspond to functional inter-subject variability.

Conclusions

Our research demonstrates that our approach facilitates cross-subject brain decoding, suggesting a potential reduction of up to 90% in scan time for subjects other than the “template”. The approach is also able to generalize across datasets and/or field strengths, significantly lowering the data quality and quantity requirements for successful brain decoding algorithms.

Acknowledgements

This work was supported by NEXTGENERATIONEU (NGEU) and funded by the Italian Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), project MNESYS (PE0000006) (to NT)– A Multiscale integrated approach to the study of the nervous system in health and disease (DN. 1553 11.10.2022); by the MUR-PNRR M4C2I1.3 PE6 project PE00000019 Heal Italia (to NT); by the NATIONAL CENTRE FOR HPC, BIG DATA AND QUANTUM COMPUTING, within the spoke "Multiscale Modeling and Engineering Applications" (to NT); the EXPERIENCE project (European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 101017727); the CROSSBRAIN project (European Union’s European Innovation Council under grant agreement No. 101070908).

References

[1] Chen, Z., Qing, J., Xiang, T., Yue, W.L., Zhou, J.H.: Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding (2022)

[2] Ozcelik, F., VanRullen, R.: Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion (2023)

[3] Ferrante, M., Ozcelik, F., Boccato, T., VanRullen, R., Toschi, N.: Brain captioning:Decoding human brain activity into images and text (2023)

[4] Ferrante, M., Boccato, T., Toschi, N.: Semantic brain decoding: from fmri to conceptually similar image reconstruction of visual stimuli (2023) [4] Allen, E.J., St-Yves, G., Wu, Y., Breedlove, J.L., Prince, J.S., Dowdle, L.T., Nau, M., Caron, B., Pestilli, F., Charest, I., Hutchinson, J.B., Naselaris, T., Kay, K.: A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience 25(1), 116–126 (Jan 2022). https://doi.org/10.1038/s41593-021-00962-x, https://doi.org/10.1038/s41593-021-00962-x

[5] Prince, J.S., Charest, I., Kurzawski, J.W., Pyles, J.A., Tarr, M., Kay, K.N. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife (2022).

[6] Chang, N., Pyles, J.A., Marcus, A., Gupta, A., Tarr, M.J., Aminoff, E.M.: Bold5000, a public fmri dataset while viewing 5000 visual images. Scientific Data 6(1), 49 (May 2019). https://doi.org/10.1038/s41597-019-0052-3, https://doi.org/10.1038/s41597-019-0052-3

Figures

Scheme of our pipeline. A decoding pipeline follows the gray line, images are shown in the scanner and a decoder is trained to reconstruct images from fMRI activity. Our approach consist in keeping a pretrained decoder also for other subject, learning how to align their fMRI activity to the functional activity of the training subject, in order to reconstruct images across subjects without the need of large-scale acquisitions.

Qualitative reconstruction of fMRI activity involves examining the 'Stimulus' column, which displays the stimuli. In addition, the 'Subj01' column presents the image reconstructed from the activity of Subj01 using its corresponding decoder. The remaining columns display reconstructed images derived from the activity of different subjects, which have been aligned with the activity of Subj01 and decoded using its decoder.

Quantitative results

A: Qualitative Comparison of impact of number of images used (as fraction of number of common images). B: Qualitative Comparison of decoding performances across different methods, where “Ridge” is our proposal and the other columns are the baselines.

Qualitative comparison of decoding performances for test images in cross-dataset experiement. "Within" column shows performances using a decoder trained on BOLD5000 data, while "Cross NSD" shows performances using our alignment pipeline across datasets. Semantic categories are matched like animals, vehicles, outdoor or indoor even if fine details are missing. Nevertheless, this shows how our functional alignment approach can decode activity also across datasets, machines and fields if a sufficient number of alignmetn samples are provided.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1755

DOI: https://doi.org/10.58530/2024/1755