TwoStep

TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models

¹University of Southern California

TwoStep multi-agent planning in AI2THOR

Put an apple, a heated egg, and a clean fork on the dining table

Put the apple, the egg, and the wine bottle in the fridge

Clear out the sink of the spoon, the cup, the knife, and the apple

Clear out the dining table by removing the plate, two bowls, and the mug

Abstract

Classical planning formulations like the Planning Domain Definition Language (PDDL) admit action sequences guaranteed to achieve a goal state given an initial state if any are possible. However, reasoning problems defined in PDDL do not capture temporal aspects of action taking, such as concurrent actions between two agents when there are no conflicting conditions, without significant modification and definition to existing PDDL domains. A human expert aware of such constraints can decompose a goal into subgoals, each reachable through single agent planning, to take advantage of simultaneous actions. In contrast to classical planning, large language models (LLMs) directly used for inferring plan steps rarely guarantee execution success, but are capable of leveraging commonsense reasoning to assemble action sequences. We combine the strengths of both classical planning and LLMs by approximating human intuitions for multi-agent planning goal decomposition. We demonstrate that LLM-based goal decomposition leads to faster planning times than solving multi-agent PDDL problems directly while simultaneously achieving fewer plan execution steps than a single agent plan alone, as well as most multiagent plans, while guaranteeing execution success. Additionally, we find that LLM-based approximations of subgoals result in similar multi-agent execution lengths to those specified by human experts.

TwoStep

We propose decomposing an $N$-agent planning problem into $N$ single-agent planning problems by leveraging the human-like commonsense reasoning capabilities of LLMs. Specifically, we consider a multiagent scenario with $N-1$ helper agents and one main agent. For a given problem $\mathrm{P}$, containing object definitions, initial state $i$ and goal conditions $g$, each helper agent $h$ generates a plan $\pi_h = \prod(i, g_h)$ to reach a subgoal state $g_h$ from the initial state $i$ using a planner $\prod$. The resulting state $i_{h+1} = E(i, \prod(i, g_h))$, where $E$ is Plan Execution returning the state reached after starting from $i$ and executing steps $\prod(i, g_h)$, serves as the starting point for the next helper agent. This iteration continues until the main agent executes $\pi_m = \prod(i_{N-1}, g)$ to achieve the final goal $g$. Each helper agent’s subgoal $g_h$ is generated through two modules: a subgoal generator for English subgoal generation and a subgoal translator for translating it into a PDDL subgoal. We hypothesize that LLMs can infer helper subgoals that enable parallel execution alongside the \main\ agent while assuming all agents will eventually achieve their respective goals.

BibTeX

@misc{bai2025twostepmultiagenttaskplanning, title={TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models}, author={David Bai and Ishika Singh and David Traum and Jesse Thomason}, year={2025}, eprint={2403.17246}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2403.17246}, }

TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models

TwoStep multi-agent planning in AI2THOR

Put an apple, a heated egg, and a clean fork on the dining table

Put the apple, the egg, and the wine bottle in the fridge

Clear out the sink of the spoon, the cup, the knife, and the apple

Clear out the dining table by removing the plate, two bowls, and the mug

Abstract

Video

TwoStep

Results

Full Subgoal Generator and Subgoal Translator Prompts

BibTeX