Indoor localization has gained significant attention in recent years due to its applications across sectors such as healthcare, logistics, manufacturing, and retail. However, while outdoor localization has been effectively addressed with GPS, indoor localization remains challenging despite significant research progress. Many studies have explored the capabilities of modern smartphones, equipped with a variety of sensors, to develop machine-learning methods for indoor localization, ranging from classical fingerprinting to deep sequence models and transformers. Nevertheless, most rely on small, proprietary datasets that are not publicly available. Large, high-quality public datasets are essential for researchers to efficiently test, refine, and validate algorithms, enable comparisons between different approaches and develop robust and accurate localization solutions. To reduce data collection time and costs and help researchers find the most appropriate datasets for their needs, this paper surveys 20 publicly available high-quality indoor localization datasets suitable for Machine Learning, released between 2014 and 2024, that cover various sensing technologies. The survey reveals a shift toward multi-sensor data collection, extending beyond Wi-Fi and Bluetooth signals to include inertial sensors such as accelerometers and gyroscopes, as well as magnetic fields. It also highlights that while over 75% of datasets cover multi-floor structures or multiple buildings, there is a scarcity of datasets covering diverse types of indoor environments, with most focused on office or academic settings. Moreover, the temporal dimension, crucial in dynamic indoor scenarios, remains largely underrepresented, limiting the development of ML models for tracking dynamic trajectories or adapting to evolving signal patterns.
Survey of smartphone-based datasets for indoor localization: A machine learning perspective
Gaetano Carmelo La Delfa
Primo
Investigation
;Hamaad RafiqueInvestigation
;Maurizio PalesiSupervision
;Davide PattiSupervision
2025-01-01
Abstract
Indoor localization has gained significant attention in recent years due to its applications across sectors such as healthcare, logistics, manufacturing, and retail. However, while outdoor localization has been effectively addressed with GPS, indoor localization remains challenging despite significant research progress. Many studies have explored the capabilities of modern smartphones, equipped with a variety of sensors, to develop machine-learning methods for indoor localization, ranging from classical fingerprinting to deep sequence models and transformers. Nevertheless, most rely on small, proprietary datasets that are not publicly available. Large, high-quality public datasets are essential for researchers to efficiently test, refine, and validate algorithms, enable comparisons between different approaches and develop robust and accurate localization solutions. To reduce data collection time and costs and help researchers find the most appropriate datasets for their needs, this paper surveys 20 publicly available high-quality indoor localization datasets suitable for Machine Learning, released between 2014 and 2024, that cover various sensing technologies. The survey reveals a shift toward multi-sensor data collection, extending beyond Wi-Fi and Bluetooth signals to include inertial sensors such as accelerometers and gyroscopes, as well as magnetic fields. It also highlights that while over 75% of datasets cover multi-floor structures or multiple buildings, there is a scarcity of datasets covering diverse types of indoor environments, with most focused on office or academic settings. Moreover, the temporal dimension, crucial in dynamic indoor scenarios, remains largely underrepresented, limiting the development of ML models for tracking dynamic trajectories or adapting to evolving signal patterns.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.