Indoor localization, powered by the Internet of Things (IoT)-based sensor fusion, represents an evolving application that uses mobile sensors and embedded hardware within buildings to pinpoint smartphone users’ locations. However, sensor heterogeneity on various smartphones has significantly compromised the reliability and accuracy of localization algorithms. To address this critical challenge, this paper introduces MH-ViL, an infrastructure-free and calibration-free framework built on top of a multi-head self-attention and vision transformer neural network. MH-ViL seamlessly integrates magnetic field signals (MFS) and visual images for localization tasks. A novel magnetic feature projection (MFP) model is proposed to effectively map MFS onto visual image features, enhancing positional accuracy within the self-attention mechanism. Extensive real-time experiments demonstrate that MH-ViL surpasses alternative models, achieving an impressive 92% accuracy. We provide a confidence interval of 95%, indicating predictions that fall within the range of 88% to 92%, with an error rate of less than 0.5 meters. Furthermore, the algorithm undergoes evaluation in both homogeneous and heterogeneous environments to assess its generalizability.

Indoor localization by projecting magnetic field signals onto images with vision transformer

Rafique, Hamaad
;
Patti, Davide;Palesi, Maurizio
;
La Delfa, Gaetano Carmelo
2025-01-01

Abstract

Indoor localization, powered by the Internet of Things (IoT)-based sensor fusion, represents an evolving application that uses mobile sensors and embedded hardware within buildings to pinpoint smartphone users’ locations. However, sensor heterogeneity on various smartphones has significantly compromised the reliability and accuracy of localization algorithms. To address this critical challenge, this paper introduces MH-ViL, an infrastructure-free and calibration-free framework built on top of a multi-head self-attention and vision transformer neural network. MH-ViL seamlessly integrates magnetic field signals (MFS) and visual images for localization tasks. A novel magnetic feature projection (MFP) model is proposed to effectively map MFS onto visual image features, enhancing positional accuracy within the self-attention mechanism. Extensive real-time experiments demonstrate that MH-ViL surpasses alternative models, achieving an impressive 92% accuracy. We provide a confidence interval of 95%, indicating predictions that fall within the range of 88% to 92%, with an error rate of less than 0.5 meters. Furthermore, the algorithm undergoes evaluation in both homogeneous and heterogeneous environments to assess its generalizability.
2025
Fingerprinting
Indoor localization
Neural networks
Transformers
Vision transformers
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/670591
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact