The ability to localize the visitors of a cultural site from egocentric images can allow applications to understand where people go and what they pay attention to in the site. Current pipelines to tackle the problem require the collection and labeling of large amounts of images, which is challenging, especially in large-scale indoor environments. On the contrary, virtual images of a cultural site can be generated and automatically labeled using dedicated tools with minimum effort. In this paper, we investigate whether unsupervised domain adaptation techniques can be used to train localization models on labeled virtual data and unlabeled real data, and deploy them to work with real images. To perform this study, we propose a new dataset of both real and virtual images acquired in a cultural site which are labeled for room-based localization as well as for 3 DOF camera pose estimation. We hence compare two approaches to unsupervised domain adaptation: mid-level representations and image-To-image translation. Our analysis shows that both approaches can be used to reduce the domain gap arising from the different data sources and that the proposed dataset is a challenging benchmark for unsupervised domain adaptation for image-based localization.
Virtual to Real Unsupervised Domain Adaptation for Image-Based Localization in Cultural Sites
Orlando S. A.;Furnari A.;Farinella G. M.
2020-01-01
Abstract
The ability to localize the visitors of a cultural site from egocentric images can allow applications to understand where people go and what they pay attention to in the site. Current pipelines to tackle the problem require the collection and labeling of large amounts of images, which is challenging, especially in large-scale indoor environments. On the contrary, virtual images of a cultural site can be generated and automatically labeled using dedicated tools with minimum effort. In this paper, we investigate whether unsupervised domain adaptation techniques can be used to train localization models on labeled virtual data and unlabeled real data, and deploy them to work with real images. To perform this study, we propose a new dataset of both real and virtual images acquired in a cultural site which are labeled for room-based localization as well as for 3 DOF camera pose estimation. We hence compare two approaches to unsupervised domain adaptation: mid-level representations and image-To-image translation. Our analysis shows that both approaches can be used to reduce the domain gap arising from the different data sources and that the proposed dataset is a challenging benchmark for unsupervised domain adaptation for image-based localization.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.