Localizing visitors in natural sites exploiting modality attention on egocentric images and GPS data

IRIS

Localizing the visitors of an outdoor natural site can be advantageous to study their behavior as well as to provide them information on where they are and what to visit in the site. Despite GPS can generally be used to perform outdoor localization, we show that this kind of signal is not always accurate enough in real-case scenarios. On the contrary, localization based on egocentric images can be more accurate but it generally results in more expensive computation. In this paper, we investigate how fusing image- and GPS-based predictions can allow to achieve efficient and accurate localization of the visitors of a natural site. Specifically, we compare different fusion techniques, including a modality attention approach which is shown to provide the best performances. Results point out that the proposed technique achieve promising results, allowing to obtain the performances of very deep models (e.g., DenseNet) with a less expensive architecture (e.g., SqueezeNet) which employ a memory footprint of about 3MB and an inference speed of about 25ms.

Localizing visitors in natural sites exploiting modality attention on egocentric images and GPS data

Pasqualino G.^Primo;Scafiti S.^Secondo;Furnari A.^Penultimo;Farinella G. M.^Ultimo

2020-01-01

Abstract

Localizing the visitors of an outdoor natural site can be advantageous to study their behavior as well as to provide them information on where they are and what to visit in the site. Despite GPS can generally be used to perform outdoor localization, we show that this kind of signal is not always accurate enough in real-case scenarios. On the contrary, localization based on egocentric images can be more accurate but it generally results in more expensive computation. In this paper, we investigate how fusing image- and GPS-based predictions can allow to achieve efficient and accurate localization of the visitors of a natural site. Specifically, we compare different fusion techniques, including a modality attention approach which is shown to provide the best performances. Results point out that the proposed technique achieve promising results, allowing to obtain the performances of very deep models (e.g., DenseNet) with a less expensive architecture (e.g., SqueezeNet) which employ a memory footprint of about 3MB and an inference speed of about 25ms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Parole chiave
	
				Egocentric (First Person) Vision
GPS
Localization
Multi-modal Data Fusion
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/482767

Citazioni

ND

1

ND

social impact