Despite significant advancements in the field of vision, language and robotics, integrating these capabilities to create an autonomous robot assistant remains a challenge. This paper presents ViLaBot (Vision and Language roBot), a system designed to aid humans in daily activities while at home. ViLaBot combines a language model with a library of basic visuomotor skills to understand human needs, create action plans and execute them. The system relies solely on onboard visual and proprioceptive sensing, eliminating the need for pre-built maps or precise object locations and facilitating real-world deployment in a variety of environments. Experimental validation conducted in 11 realistic home environments featuring simulated human agents using the Habitat simulator indicated that ViLaBot can achieve promising results when using ground-truth image segmentation, yet exhibits modest performance in scenarios involving imperfect visual perception. The results support the validity of the proposed pipeline and highlight the critical components of the system that should be improved to increase its overall success rate and reliability.

ViLaBot: Connecting Vision and Language for Robots That Assist Humans at Home

Yaar, Asfand;Rosano, Marco;Furnari, Antonino;Farinella, Giovanni Maria
2024-01-01

Abstract

Despite significant advancements in the field of vision, language and robotics, integrating these capabilities to create an autonomous robot assistant remains a challenge. This paper presents ViLaBot (Vision and Language roBot), a system designed to aid humans in daily activities while at home. ViLaBot combines a language model with a library of basic visuomotor skills to understand human needs, create action plans and execute them. The system relies solely on onboard visual and proprioceptive sensing, eliminating the need for pre-built maps or precise object locations and facilitating real-world deployment in a variety of environments. Experimental validation conducted in 11 realistic home environments featuring simulated human agents using the Habitat simulator indicated that ViLaBot can achieve promising results when using ground-truth image segmentation, yet exhibits modest performance in scenarios involving imperfect visual perception. The results support the validity of the proposed pipeline and highlight the critical components of the system that should be improved to increase its overall success rate and reliability.
2024
assistive tasks
human-robot interaction
navigation and manipulation
task planning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/713490
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact