This paper presents Visual Market Basket Analysis (VMBA), a novel application domain for egocentric vision systems. The final goal of VMBA is to infer the behavior of the customers of a store during their shopping. The analysis relies on image sequences acquired by cameras mounted on shopping carts. The inferred behaviors can be coupled with classic Market Basket Analysis information (i.e., receipts) to help retailers to improve the management of spaces and marketing strategies. To set up the challenge, we collected a new dataset of egocentric videos during real shopping sessions in a retail store. Video frames have been labeled according to a proposed hierarchy of 14 different customer behaviors from the beginning (cart picking) to the end (cart releasing) of their shopping. We benchmark different representation and classification techniques and propose a multi-modal method which exploits visual, motion and audio descriptors to perform classification with the Directed Acyclic Graph SVM learning architecture. Experiments highlight that employing multimodal representations and explicitly addressing the task in a hierarchical way is beneficial. The devised approach based on Deep Features achieves an accuracy of more than 87% over the 14 classes of the considered dataset.
|Titolo:||Market basket analysis from egocentric videos|
|Data di pubblicazione:||2018|
|Appare nelle tipologie:||1.1 Articolo in rivista|