This paper presents Visual Market Basket Analysis (VMBA), a novel application domain for egocentric vision systems. The final goal of VMBA is to infer the behavior of the customers of a store during their shopping. The analysis relies on image sequences acquired by cameras mounted on shopping carts. The inferred behaviors can be coupled with classic Market Basket Analysis information (i.e., receipts) to help retailers to improve the management of spaces and marketing strategies. To set up the challenge, we collected a new dataset of egocentric videos during real shopping sessions in a retail store. Video frames have been labeled according to a proposed hierarchy of 14 different customer behaviors from the beginning (cart picking) to the end (cart releasing) of their shopping. We benchmark different representation and classification techniques and propose a multi-modal method which exploits visual, motion and audio descriptors to perform classification with the Directed Acyclic Graph SVM learning architecture. Experiments highlight that employing multimodal representations and explicitly addressing the task in a hierarchical way is beneficial. The devised approach based on Deep Features achieves an accuracy of more than 87% over the 14 classes of the considered dataset.

Market basket analysis from egocentric videos

SANTARCANGELO, VITO;Farinella, Giovanni Maria;Furnari, Antonino;Battiato, Sebastiano
2018

Abstract

This paper presents Visual Market Basket Analysis (VMBA), a novel application domain for egocentric vision systems. The final goal of VMBA is to infer the behavior of the customers of a store during their shopping. The analysis relies on image sequences acquired by cameras mounted on shopping carts. The inferred behaviors can be coupled with classic Market Basket Analysis information (i.e., receipts) to help retailers to improve the management of spaces and marketing strategies. To set up the challenge, we collected a new dataset of egocentric videos during real shopping sessions in a retail store. Video frames have been labeled according to a proposed hierarchy of 14 different customer behaviors from the beginning (cart picking) to the end (cart releasing) of their shopping. We benchmark different representation and classification techniques and propose a multi-modal method which exploits visual, motion and audio descriptors to perform classification with the Directed Acyclic Graph SVM learning architecture. Experiments highlight that employing multimodal representations and explicitly addressing the task in a hierarchical way is beneficial. The devised approach based on Deep Features achieves an accuracy of more than 87% over the 14 classes of the considered dataset.
Egocentric vision; Market basket analysis; Multimodal analysis; Software; Signal Processing; 1707; Artificial Intelligence
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.11769/335579
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 7
social impact