Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing. Our code and eye-tracking annotations are available on GitHub .

Decoding attention from the visual cortex: fMRI-based prediction of human saliency maps

Calcagno S.;Finocchiaro M.;Bellitto G.;Spampinato C.;Proietto Salanitri F.
2026-01-01

Abstract

Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing. Our code and eye-tracking annotations are available on GitHub .
2026
BCI
Neural coding
Saliency prediction
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/715575
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact