Large Language Model Guided State Estimation for Partially Observable Task and Motion Planning

Overview

Results

Plans 95
Pairwise comparison of total number of replans needed in different experiment settings compared to CoCo-TAMP.
Time 95
Pairwise comparison of total number of replans needed in different experiment settings compared to CoCo-TAMP.
LLM Comparison
Comparison of different large language models for planning and execution time.

Abstract

Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be stored together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 67% in planning and execution time in simulation, and 72% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

Video