Planning Algorithms and Strategies for AI Agents
Planning algorithms determine how your agent explores solution paths when the answer isn't obvious. Chain-of-Thought, Tree of Thought, MCTS, ReAct, Plan-and-Execute — the key engineering challenge is balancing thoroughness against cost. More exploration means higher quality but exponential latency and token spend.
From Single-Step to Multi-Path Reasoning
Planning algorithms give agents the ability to reason about multiple possible paths before committing to action. Without them, agents make shallow decisions that fail on problems requiring deliberation.
Chain-of-Thought: Structured Step-by-Step Reasoning
Chain-of-Thought (CoT) prompting forces the model to externalize intermediate reasoning steps before arriving at an answer. Instead of jumping directly from question to answer, the model shows its work.
| Approach | Prompt Pattern | When to Use |
|---|---|---|
| Zero-shot CoT | "Let's think step by step" | Quick reasoning boost, no examples needed |
| Few-shot CoT | Provide examples with explicit reasoning chains | Domain-specific reasoning patterns |
| Self-consistency CoT | Generate multiple reasoning paths, vote on final answer | High-stakes decisions where you can afford extra compute |
Tree of Thought: Multi-Path Exploration
Tree of Thought (ToT) extends CoT by exploring multiple reasoning branches in parallel, evaluating intermediate states, and backtracking when paths fail.
Python · Tree of Thought@dataclass
class ThoughtNode:
content: str
parent: 'ThoughtNode | None'
score: float = 0.0
children: list['ThoughtNode'] = None
class TreeOfThought:
def __init__(self, generator, evaluator, max_depth=5, beam_width=3):
self.generator = generator # Generate k next thoughts
self.evaluator = evaluator # Score a thought path
self.max_depth = max_depth
self.beam_width = beam_width
def search(self, problem: str) -> str:
root = ThoughtNode(content=problem, parent=None)
frontier = [root]
for depth in range(self.max_depth):
candidates = []
for node in frontier:
next_thoughts = self.generator(self._path_to_prompt(node))
for thought in next_thoughts:
child = ThoughtNode(content=thought, parent=node)
child.score = self.evaluator(self._path_to_prompt(child))
node.children.append(child)
candidates.append(child)
# Keep top-k by score (beam search)
candidates.sort(key=lambda n: n.score, reverse=True)
frontier = candidates[:self.beam_width]
# Early termination if best path is confident
if frontier and frontier[0].score > 0.95:
break
return self._path_to_prompt(frontier[0]) if frontier else ""
Monte Carlo Tree Search for LLM Reasoning
MCTS brings game-playing algorithms to LLM reasoning. Instead of evaluating every possible path, it uses random rollouts to estimate which branches are most promising. The cycle runs four phases repeatedly:
ReAct vs. Plan-and-Execute
Two dominant paradigms structure how agents plan and act:
⚡ ReAct
- Interleaves reasoning and action in a tight loop: Think → Act → Observe → Think
- Agile, adapts quickly to new observations
- Lower latency per decision
- Can get stuck in loops, loses sight of long-horizon goals
🗺 Plan-and-Execute
- Generates a complete plan upfront, then executes step by step
- Better for complex multi-step tasks with clear dependencies
- Easier to debug, clearer accountability
- Higher latency to first action; plan may go stale
Test-Time Compute Scaling
More thinking time generally means better answers. The question is: how much is worth it?
| Strategy | Compute Multiplier | When to Use |
|---|---|---|
| Single-pass | 1× | Simple tasks, latency-critical |
| Self-consistency (5 samples) | 5× | Moderate complexity, need confidence |
| Tree search (beam=3, depth=5) | ~15× | Complex reasoning, quality critical |
| MCTS (100+ rollouts) | 50–100× | Optimization problems, decision under uncertainty |
- Chain-of-Thought forces step-by-step reasoning but gains are sensitive to prompt example quality
- Tree of Thought explores multiple paths with evaluation and backtracking — 10–50× more expensive than single-pass
- MCTS improves success rates from 72% to 98% on optimization problems by focusing compute on promising paths
- ReAct is agile but loses long-horizon goals; Plan-and-Execute maintains strategy at the cost of latency to first action
- o1 is 6× more expensive than GPT-4o but scores 74% vs. 9% on math olympiad problems — the tradeoff is explicit
Implementation Decisions That Shape Your Agent
As a software engineer building agentic systems, you're writing the code that decides how much your agent thinks before acting. Get this wrong and you either ship an agent that makes shallow mistakes, or one that burns through your API budget on problems that didn't need deep reasoning.
Evaluator Design Patterns
Your evaluator function determines which reasoning paths survive in tree search and MCTS. A bad evaluator means the agent spends compute exploring dead ends.
Python · Evaluator Factoryclass EvaluatorType(Enum):
LLM_JUDGE = "llm_judge" # Use LLM to score reasoning quality
EXECUTION = "execution" # Run code, score by test results
HEURISTIC = "heuristic" # Domain-specific rules
COMPOSITE = "composite" # Combine multiple evaluators
def create_evaluator(config: EvaluatorConfig):
if config.type == EvaluatorType.LLM_JUDGE:
return lambda path: llm_score(path, config.llm_model)
elif config.type == EvaluatorType.EXECUTION:
return lambda path: run_tests(extract_code(path), config.test_suite)
elif config.type == EvaluatorType.COMPOSITE:
evaluators = [create_evaluator(sub) for sub in config.sub_evaluators]
return lambda path: sum(
w * e(path) for e, w in zip(evaluators, config.weights.values())
)
Cost Monitoring Is Non-Negotiable
A depth-5 tree with beam width 3 generates up to 3⁵ = 243 reasoning paths. At $0.01 per path, that's $2.43 per query before you've executed a single action. Build cost tracking in from the start:
Python · Search Cost Tracking@dataclass
class SearchMetrics:
nodes_expanded: int = 0
llm_calls: int = 0
total_tokens: int = 0
estimated_cost: float = 0.0
wall_clock_seconds: float = 0.0
def add_llm_call(self, input_tokens: int, output_tokens: int, model: str):
self.llm_calls += 1
self.total_tokens += input_tokens + output_tokens
self.estimated_cost += estimate_cost(input_tokens, output_tokens, model)
- Algorithm selection should vary by task — style checks get single-pass, architectural decisions get tree search
- Evaluator quality determines search efficiency — bad evaluators waste compute on dead-end paths
- Build cost tracking with hard limits into every search implementation to prevent budget blowouts
Why Engineers Get This Wrong
Mistake 1: Using Tree Search for Everything
Mistake 2: Evaluators That Mirror the Generator
Mistake 3: Ignoring Latency-to-First-Token
- Tree search isn't universally better — profile single-pass and CoT before escalating to expensive algorithms
- Evaluators using the same model as the generator create feedback loops that amplify confident wrong answers
- Plan-and-Execute must stream intermediate state or users abandon agents during long planning phases
Building an Adaptive Planning Algorithm Selector
The right planning algorithm depends on your task — not a single upfront architectural decision. Here's a complete implementation that selects strategy based on measured task characteristics.
Step 1: Define Task Characteristics
Python · Task Profileclass TaskComplexity(Enum):
SIMPLE = "simple" # Direct answer, no reasoning needed
MODERATE = "moderate" # Some reasoning, clear path
COMPLEX = "complex" # Multiple valid approaches
AMBIGUOUS = "ambiguous" # Solution path unclear
class LatencyRequirement(Enum):
REALTIME = "realtime" # < 2 seconds
INTERACTIVE = "interactive" # < 10 seconds
BACKGROUND = "background" # < 60 seconds
BATCH = "batch" # No limit
@dataclass
class TaskProfile:
complexity: TaskComplexity
latency_req: LatencyRequirement
has_verification: bool # Can we verify correctness programmatically?
budget_per_query: float # Max $ to spend on this query
Step 2: Implement the Algorithm Selector
Python · Algorithm Selectordef select_planning_algorithm(profile: TaskProfile) -> PlanningConfig:
# Realtime always gets single-pass or CoT
if profile.latency_req == LatencyRequirement.REALTIME:
if profile.complexity in (SIMPLE, MODERATE):
return PlanningConfig(algorithm="single_pass")
return PlanningConfig(algorithm="cot")
# Complex tasks with verification → tree search or MCTS
if profile.complexity in (COMPLEX, AMBIGUOUS):
if profile.has_verification and profile.latency_req == BATCH:
return PlanningConfig(
algorithm="mcts",
rollouts=min(100, int(profile.budget_per_query / 0.02)),
evaluator_type="execution"
)
if profile.latency_req == INTERACTIVE:
return PlanningConfig(
algorithm="tree_search",
beam_width=2, max_depth=3,
evaluator_type="execution" if profile.has_verification else "llm_judge"
)
return PlanningConfig(algorithm="cot")
Step 3: Add Cost Guards with Graceful Degradation
Python · Budget Guardclass CostGuardedPlanner(AdaptivePlanner):
def _tree_search(self, task: str, config: PlanningConfig) -> PlanResult:
def guarded_generator(prompt):
if self.metrics.estimated_cost > self.max_cost:
raise BudgetExceeded(
f"Exceeded budget: ${self.metrics.estimated_cost:.2f}"
)
return self.llm.generate_k(prompt, k=config.beam_width)
tree = TreeOfThought(generator=guarded_generator, ...)
try:
result = tree.search(task)
return PlanResult(answer=result, confidence=0.9)
except BudgetExceeded:
# Return best partial result — never fail silent
return PlanResult(answer=tree.best_so_far(), confidence=0.6, truncated=True)
Before vs. After: Adaptive vs. One-Size-Fits-All
| Aspect | ❌ One-Size-Fits-All | ✅ Adaptive Selection |
|---|---|---|
| Simple queries | 10s latency, $0.50 (tree search overkill) | 1s latency, $0.02 (single-pass) |
| Complex reasoning | Shallow mistakes (single-pass insufficient) | Thorough exploration (tree search applied) |
| Budget management | Unpredictable costs, occasional blowouts | Hard limits, graceful degradation |
| Observability | "It took 30 seconds" | Per-algorithm metrics, cost breakdown |
- Task profiling — complexity, latency, verification capability, budget — determines optimal algorithm selection
- Cost guards with graceful degradation prevent budget blowouts while still returning best-effort results
- Adaptive selection can reduce costs by 25× on simple queries while improving quality on complex ones
Key Takeaways
Build Your AI Skills Systematically
This article is part of the AI Fluens AI agent engineering track.
Get a personalized week-by-week AI upskill plan tailored to your role.
