“Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. … In this work, we propose a novel navigation framework to address the VLN task in the real world. … The proposed framework includes four key components: (1) an LLMs-based instruction parser … (2) an online visual-language mapper … (3) a language indexing-based localizer … and (4) a DD-PPO-based local controller.”
Find the paper and authors list at ArXiv.