Large Language Model Guided State Estimation for Partially Observable Task and Motion Planning

Overview

Results

Plans 95
Pairwise comparison of total number of replans needed in different experiment settings compared to CoCo-TAMP.
Time 95
Pairwise comparison of total number of replans needed in different experiment settings compared to CoCo-TAMP.
LLM Comparison
Comparison of different large language models for planning and execution time.

Abstract

Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be stored together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 67% in planning and execution time in simulation, and 72% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

Video


BibTeX

  @inproceedings{kim2026cocotamp,
    title={Large Language Model Guided State Estimation for Partially Observable Task and Motion Planning},
    author={Kim, Yoonwoo and Arora, Raghav and Mart{\'i}n-Mart{\'i}n, Roberto and Stone, Peter and Abbatematteo, Ben and Sung, Yoonchang},
    booktitle={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
    year={2026},
    note={Accepted},
    url={https://arxiv.org/abs/2603.03704},
    archivePrefix={arXiv},
    eprint={2603.03704}
  }
            
Painting made with acrylic and gouache paints, which mimics the poster for the Pixar movie Coco. It features two people standing on a bridge made of yellow-orange petals holding guitars. One is a young kid wearing a red hoodie and blue pants with a brown guitar. The second person is a skeleton wearing red shirt, brown pants, and light brown hat. He is also holding a brown guitar. Both the skeleton and the boy do not have fingers because the author (and painter) of this does not know how to draw and paint fingers properly. In the background, fireworks are going off to celebrate the Day of the Dead (Dia de (los) Muertos), and on top is written COCO TAMP. COCO TAMP stands for Commonsense Correlational Task and Motion Planner, and this poster uses the familiarity of the research name with the Pixar movie Coco.

Painting by Raghav, inspired by the Disney Pixar Coco poster