Egocentric Action Anticipation by Disentangling Encoding and Inference

IRIS

Egocentric action anticipation consists in predicting future actions from videos collected by means of a wearable camera. Action anticipation methods should be able to continuously 1) summarize the past and 2) predict possible future actions. We observe that action anticipation benefits from explicitly disentangling the two tasks. To this aim, we introduce a learning architecture which makes use of a 'rolling' LSTM to continuously summarize the past and an 'unrolling' LSTM to anticipate future actions at multiple temporal scales. The model includes a spatial and a temporal branch which process RGB images and optical flow fields independently. The predictions performed by the two branches are fused using a novel modality attention mechanism which leverages the complementary nature of the modalities. Experiments on the EPIC-KITCHENS dataset show that the proposed method surpasses the state-of-the-art by +4.02% and +6.39% when considering Top-1 and Top-5 accuracy respectively. Please see the project webpage at http://iplab.dmi.unict.it/rulstm/.

Egocentric Action Anticipation by Disentangling Encoding and Inference

Furnari A.;Farinella G. M.

2019-01-01

Abstract

Egocentric action anticipation consists in predicting future actions from videos collected by means of a wearable camera. Action anticipation methods should be able to continuously 1) summarize the past and 2) predict possible future actions. We observe that action anticipation benefits from explicitly disentangling the two tasks. To this aim, we introduce a learning architecture which makes use of a 'rolling' LSTM to continuously summarize the past and an 'unrolling' LSTM to anticipate future actions at multiple temporal scales. The model includes a spatial and a temporal branch which process RGB images and optical flow fields independently. The predictions performed by the two branches are fused using a novel modality attention mechanism which leverages the complementary nature of the modalities. Experiments on the EPIC-KITCHENS dataset show that the proposed method surpasses the state-of-the-art by +4.02% and +6.39% when considering Top-1 and Top-5 accuracy respectively. Please see the project webpage at http://iplab.dmi.unict.it/rulstm/.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Codice ISBN
	
				978-1-5386-6249-6
			
	Parole chiave
	
				Action Anticipation; Egocentric Vision; EPIC-KITCHENS; First Person Vision; LSTM
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/375445

Citazioni

ND

4

ND

social impact