Top-down saliency detection driven by visual classification

IRIS

This paper presents an approach for saliency detection able to emulate the integration of the top-down (taskcontrolled) and bottom-up (sensory information) processes involved in human visual attention. In particular, we first learn how to generate saliency when a specific visual task has to be accomplished. Afterwards, we investigate if and to what extent the learned saliency maps can support visual classification in nontrivial cases. To achieve this, we propose SalClassNet, a CNN framework consisting of two networks jointly trained: a) the first one computing top-down saliency maps from input images, and b) the second one exploiting the computed saliency maps for visual classification. To test our approach, we collected a dataset of eye-gaze maps, using a Tobii T60 eye tracker, by asking several subjects to look at images from the Stanford Dogs dataset, with the objective of distinguishing dog breeds. Performance analysis on our dataset and other saliency benchmarking datasets, such as POET, showed that SalClassNet outperforms state-of-the-art saliency detectors, such as SalNet and SALICON. Finally, we also analyzed the performance of SalClassNet in a fine-grained recognition task and found out that it yields enhanced classification accuracy compared to Inception and VGG-19 classifiers. The achieved results, thus, demonstrate that 1) conditioning saliency detectors with object classes reaches state-of-the-art performance, and 2) explicitly providing top-down saliency maps to visual classifiers enhances accuracy.

Top-down saliency detection driven by visual classification

Murabito Francesca;Spampinato Concetto;Palazzo Simone;Giordano Daniela;Pogorelov Konstantin;Riegler Michael

2018-01-01

Abstract

This paper presents an approach for saliency detection able to emulate the integration of the top-down (taskcontrolled) and bottom-up (sensory information) processes involved in human visual attention. In particular, we first learn how to generate saliency when a specific visual task has to be accomplished. Afterwards, we investigate if and to what extent the learned saliency maps can support visual classification in nontrivial cases. To achieve this, we propose SalClassNet, a CNN framework consisting of two networks jointly trained: a) the first one computing top-down saliency maps from input images, and b) the second one exploiting the computed saliency maps for visual classification. To test our approach, we collected a dataset of eye-gaze maps, using a Tobii T60 eye tracker, by asking several subjects to look at images from the Stanford Dogs dataset, with the objective of distinguishing dog breeds. Performance analysis on our dataset and other saliency benchmarking datasets, such as POET, showed that SalClassNet outperforms state-of-the-art saliency detectors, such as SalNet and SALICON. Finally, we also analyzed the performance of SalClassNet in a fine-grained recognition task and found out that it yields enhanced classification accuracy compared to Inception and VGG-19 classifiers. The achieved results, thus, demonstrate that 1) conditioning saliency detectors with object classes reaches state-of-the-art performance, and 2) explicitly providing top-down saliency maps to visual classifiers enhances accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2018

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Top-down saliency detection driven by visual classification.pdf solo gestori archivio Tipologia: Versione Editoriale (PDF) Dimensione 2.79 MB Formato Adobe PDF Visualizza/Apri	2.79 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/373629

Citazioni

ND

37

26

social impact