Egocentric action anticipation consists in predicting future actions from videos collected by means of a wearable camera. Action anticipation methods should be able to continuously 1) summarize the past and 2) predict possible future actions. We observe that action anticipation benefits from explicitly disentangling the two tasks. To this aim, we introduce a learning architecture which makes use of a 'rolling' LSTM to continuously summarize the past and an 'unrolling' LSTM to anticipate future actions at multiple temporal scales. The model includes a spatial and a temporal branch which process RGB images and optical flow fields independently. The predictions performed by the two branches are fused using a novel modality attention mechanism which leverages the complementary nature of the modalities. Experiments on the EPIC-KITCHENS dataset show that the proposed method surpasses the state-of-the-art by +4.02% and +6.39% when considering Top-1 and Top-5 accuracy respectively. Please see the project webpage at http://iplab.dmi.unict.it/rulstm/.

Egocentric Action Anticipation by Disentangling Encoding and Inference

Furnari A.;Farinella G. M.
2019-01-01

Abstract

Egocentric action anticipation consists in predicting future actions from videos collected by means of a wearable camera. Action anticipation methods should be able to continuously 1) summarize the past and 2) predict possible future actions. We observe that action anticipation benefits from explicitly disentangling the two tasks. To this aim, we introduce a learning architecture which makes use of a 'rolling' LSTM to continuously summarize the past and an 'unrolling' LSTM to anticipate future actions at multiple temporal scales. The model includes a spatial and a temporal branch which process RGB images and optical flow fields independently. The predictions performed by the two branches are fused using a novel modality attention mechanism which leverages the complementary nature of the modalities. Experiments on the EPIC-KITCHENS dataset show that the proposed method surpasses the state-of-the-art by +4.02% and +6.39% when considering Top-1 and Top-5 accuracy respectively. Please see the project webpage at http://iplab.dmi.unict.it/rulstm/.
2019
978-1-5386-6249-6
Action Anticipation; Egocentric Vision; EPIC-KITCHENS; First Person Vision; LSTM
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/375445
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact