Navigation in the open world is an elusive grail of robotics, owing to the sheer diversity of environments, agents and scenarios a robot can encounter. What does a robot need to navigate zero-shot to a goal, when deployed anywhere in the world?
We aim to push robot navigation into the open world, by making advances in 3 key directions. When designing the robot “in the factory”, we focus on improving its ability to generalise to novel scenarios and tasks in the real world, particularly by leveraging progress in foundation models. Yet training data is bounded, and robots will inevitably encounter challenging out-of-distribution scenarios “in the wild”. Managing such scenarios is critical to ensuring the robustness of open world navigation. To do so, we can exploit priors available to the robot prior to deployment – e.g., scene-specific priors like floor-plans or language directions – to guide navigation. If the robot still ultimately lands itself in a failure state, it needs to have the ability to identify, analyse and take reasonable actions to handle the failure.
Open Scene Graphs
J. Loo, Z. Wu, D. Hsu, Open Scene Graphs for Open World Object-goal Navigation
PDF | Video | Website
How can we build robots for open-world semantic navigation tasks, like searching for target objects in novel scenes? While foundation models have the rich knowledge and generalisation needed for these tasks, a suitable scene representation is needed to connect them into a complete robot system. We address this with Open Scene Graphs (OSGs), a topo-semantic representation that stores and organises open-set scene information for these models. OSGs are a generalisation of existing scene graphs to handle diverse indoor environments, using customisable OSG schemas to enable flexible structure and semantics across environments ranging from homes to supermarkets to offices. We integrate foundation models and OSGs into the OSG Navigator system for Open World Object-Goal Navigation, which is capable of searching for open-set objects specified in natural language, while generalising zero-shot across diverse environments and embodiments. Our OSGs enhance reasoning with Large Language Models (LLM), enabling robust object-goal navigation outperforming existing LLM approaches. Through simulation and real-world experiments, we validate OSG Navigator ’s generalisation across varied environments, robots and novel instructions.
Scene Action Maps
J. Loo and D. Hsu, Scene action maps: Behavioural maps for navigation without metric information. In In Proc. IEEE Int. Conf. on Robotics & Automation, 2024.
PDF | Video | Website
Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., “follow the corridor” or “turn right”, while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot.