Visual Saliency Detection guided by Neural Signals

Palazzo, Simone; Rundo, Francesco; Battiato, Sebastiano; Giordano, Daniela; Spampinato, Concetto

doi:10.1109/fg47880.2020.00068

Saliency detection is a fundamental process of human visual perception, since it allows us to identify the most important parts of a scene, directing our analysis and interpretation capabilities on a reduced set of information and reducing reaction times. However, current approaches for automatic saliency detection either attempt to mimic human capabilities by building attention maps from hand-crafted feature analysis, or employ convolutional neural networks trained as black boxes, without any architectural or information prior from human biology.In this paper, we present an approach for saliency detection that combines the success of deep learning in identifying representations for visual data with a training paradigm aimed at matching neural activity provided directly by brain signals recorded while subjects look at images. We show that our approach is able to capture correspondences between visual elements and neural activities, successfully generalizing to unseen images to identify their most salient regions.

Visual Saliency Detection guided by Neural Signals

Palazzo, Simone;Rundo, Francesco;Battiato, Sebastiano;Giordano, Daniela;Spampinato, Concetto

2020-01-01

Abstract

Saliency detection is a fundamental process of human visual perception, since it allows us to identify the most important parts of a scene, directing our analysis and interpretation capabilities on a reduced set of information and reducing reaction times. However, current approaches for automatic saliency detection either attempt to mimic human capabilities by building attention maps from hand-crafted feature analysis, or employ convolutional neural networks trained as black boxes, without any architectural or information prior from human biology.In this paper, we present an approach for saliency detection that combines the success of deep learning in identifying representations for visual data with a training paradigm aimed at matching neural activity provided directly by brain signals recorded while subjects look at images. We show that our approach is able to capture correspondences between visual elements and neural activities, successfully generalizing to unseen images to identify their most salient regions.