We consider the problem of detecting Human-Object Interactions (HOIs) from images acquired through the use of wearable devices. Understanding human-object interactions allows to support users in different scenarios, ranging from everyday activities to industrial contexts. In particular, by detecting interactions, it is possible to provide assistance in using specific objects or enhance worker safety by alerting them when they are interacting with dangerous tools. The current approaches for detecting egocentric human-object interactions (EHOIs) require the collection and labeling of domain-specific data in order to fine-tune the models to work in a specific target environment. To reduce the labeling costs generally associated with this process, we developed a new tool that uses spatial mapping, hand poses estimation, and camera tracking capabilities available in the Augmented Reality stack of Microsoft HoloLens2 to collect and automatically label images of human-object interactions performed in the real-world. To assess the effectiveness of the proposed tool, we collected and automatically labeled a dataset of human-object interactions performed on an industrial panel. Experiments with two EHOI recognition models suggest that using the data collected by the proposed tool can improve model performance in the considered target domain.
An AR-Based Tool for Acquisition and Automatic Labeling of Human-Object Interactions From First Person Vision
Seminara L.;Ragusa F.;Leonardi R.;Farinella G. M.;Furnari A.
2023-01-01
Abstract
We consider the problem of detecting Human-Object Interactions (HOIs) from images acquired through the use of wearable devices. Understanding human-object interactions allows to support users in different scenarios, ranging from everyday activities to industrial contexts. In particular, by detecting interactions, it is possible to provide assistance in using specific objects or enhance worker safety by alerting them when they are interacting with dangerous tools. The current approaches for detecting egocentric human-object interactions (EHOIs) require the collection and labeling of domain-specific data in order to fine-tune the models to work in a specific target environment. To reduce the labeling costs generally associated with this process, we developed a new tool that uses spatial mapping, hand poses estimation, and camera tracking capabilities available in the Augmented Reality stack of Microsoft HoloLens2 to collect and automatically label images of human-object interactions performed in the real-world. To assess the effectiveness of the proposed tool, we collected and automatically labeled a dataset of human-object interactions performed on an industrial panel. Experiments with two EHOI recognition models suggest that using the data collected by the proposed tool can improve model performance in the considered target domain.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.