Skip to main content

50% off for new users — use code AIFLUENS50 at checkout

Planning Algorithms and Strategies for AI Agents

Planning algorithms determine how your agent explores solution paths when the answer isn't obvious. Chain-of-Thought, Tree of Thought, MCTS, ReAct, Plan-and-Execute — the key engineering challenge is balancing thoroughness against cost. More exploration means higher quality but exponential latency and token spend.

From Single-Step to Multi-Path Reasoning

Planning algorithms give agents the ability to reason about multiple possible paths before committing to action. Without them, agents make shallow decisions that fail on problems requiring deliberation.

Chain-of-Thought: Structured Step-by-Step Reasoning

Chain-of-Thought (CoT) prompting forces the model to externalize intermediate reasoning steps before arriving at an answer. Instead of jumping directly from question to answer, the model shows its work.

ApproachPrompt PatternWhen to Use
Zero-shot CoT "Let's think step by step" Quick reasoning boost, no examples needed
Few-shot CoT Provide examples with explicit reasoning chains Domain-specific reasoning patterns
Self-consistency CoT Generate multiple reasoning paths, vote on final answer High-stakes decisions where you can afford extra compute
💡
CoT gains are highly sensitive to how well prompt examples match the problem structure. Generalization beyond narrow contexts remains elusive — don't assume a CoT prompt that works on one task transfers to another.

Tree of Thought: Multi-Path Exploration

Tree of Thought (ToT) extends CoT by exploring multiple reasoning branches in parallel, evaluating intermediate states, and backtracking when paths fail.

Python · Tree of Thought@dataclass
class ThoughtNode:
    content: str
    parent: 'ThoughtNode | None'
    score: float = 0.0
    children: list['ThoughtNode'] = None

class TreeOfThought:
    def __init__(self, generator, evaluator, max_depth=5, beam_width=3):
        self.generator = generator    # Generate k next thoughts
        self.evaluator = evaluator    # Score a thought path
        self.max_depth = max_depth
        self.beam_width = beam_width

    def search(self, problem: str) -> str:
        root = ThoughtNode(content=problem, parent=None)
        frontier = [root]

        for depth in range(self.max_depth):
            candidates = []
            for node in frontier:
                next_thoughts = self.generator(self._path_to_prompt(node))
                for thought in next_thoughts:
                    child = ThoughtNode(content=thought, parent=node)
                    child.score = self.evaluator(self._path_to_prompt(child))
                    node.children.append(child)
                    candidates.append(child)

            # Keep top-k by score (beam search)
            candidates.sort(key=lambda n: n.score, reverse=True)
            frontier = candidates[:self.beam_width]

            # Early termination if best path is confident
            if frontier and frontier[0].score > 0.95:
                break

        return self._path_to_prompt(frontier[0]) if frontier else ""
Tree of Thought Search
Problem
Thought 1
score: 0.7
Thought 2
score: 0.4
Thought 3
score: 0.8
Thought 1.1
score: 0.6
Thought 1.2
score: 0.5
Thought 3.1
score: 0.9
Thought 3.2
score: 0.7
Solution
Low-scoring branches (red) are pruned. The highest-scoring path (green) is expanded toward the solution.

Monte Carlo Tree Search for LLM Reasoning

MCTS brings game-playing algorithms to LLM reasoning. Instead of evaluating every possible path, it uses random rollouts to estimate which branches are most promising. The cycle runs four phases repeatedly:

Phase 1
Selection
Navigate from root to a leaf using UCB to balance exploration vs. exploitation
Phase 2
Expansion
Add new child nodes representing possible next steps from the selected leaf
Phase 3
Simulation
Roll out random completions to estimate the value of this path
Phase 4
Backpropagation
Update scores along the entire path based on simulation results
72%
Single-pass success on optimization problems
98%
MCTS success rate on the same problems
74.4%
o1 score on IMO qualifying exams
9.3%
GPT-4o score on the same exams
MCTS Four-Phase Cycle
Root State
1. Selection
(UCB Navigate)
Promising
Leaf Node
2. Expansion
(Add Children)
New Child
Nodes
3. Simulation
(Random Rollout)
Win/Loss
Estimate
4. Backpropagation
(Update Scores)
MCTS focuses compute on promising branches through repeated cycles of selection, expansion, simulation, and backpropagation.

ReAct vs. Plan-and-Execute

Two dominant paradigms structure how agents plan and act:

⚡ ReAct

  • Interleaves reasoning and action in a tight loop: Think → Act → Observe → Think
  • Agile, adapts quickly to new observations
  • Lower latency per decision
  • Can get stuck in loops, loses sight of long-horizon goals

🗺 Plan-and-Execute

  • Generates a complete plan upfront, then executes step by step
  • Better for complex multi-step tasks with clear dependencies
  • Easier to debug, clearer accountability
  • Higher latency to first action; plan may go stale
ReAct vs. Plan-and-Execute
Plan-and-Execute
Full Plan
Planner
(Strategy)
Executor
(Tactics)
Action 1
Action 2
Action N
ReAct Pattern
Think
Act
Observe
ReAct interleaves thinking and acting in rapid cycles. Plan-and-Execute separates strategy (planner) from implementation (executor).

Test-Time Compute Scaling

More thinking time generally means better answers. The question is: how much is worth it?

StrategyCompute MultiplierWhen to Use
Single-passSimple tasks, latency-critical
Self-consistency (5 samples)Moderate complexity, need confidence
Tree search (beam=3, depth=5)~15×Complex reasoning, quality critical
MCTS (100+ rollouts)50–100×Optimization problems, decision under uncertainty
💡
The 2025 research on Agentic Plan Caching shows a practical cost mitigation: reuse plan templates across similar tasks, reducing serving costs by 50% and latency by 27% while maintaining 96% of optimal performance.
Key Points
  • Chain-of-Thought forces step-by-step reasoning but gains are sensitive to prompt example quality
  • Tree of Thought explores multiple paths with evaluation and backtracking — 10–50× more expensive than single-pass
  • MCTS improves success rates from 72% to 98% on optimization problems by focusing compute on promising paths
  • ReAct is agile but loses long-horizon goals; Plan-and-Execute maintains strategy at the cost of latency to first action
  • o1 is 6× more expensive than GPT-4o but scores 74% vs. 9% on math olympiad problems — the tradeoff is explicit

Implementation Decisions That Shape Your Agent

As a software engineer building agentic systems, you're writing the code that decides how much your agent thinks before acting. Get this wrong and you either ship an agent that makes shallow mistakes, or one that burns through your API budget on problems that didn't need deep reasoning.

Evaluator Design Patterns

Your evaluator function determines which reasoning paths survive in tree search and MCTS. A bad evaluator means the agent spends compute exploring dead ends.

Python · Evaluator Factoryclass EvaluatorType(Enum):
    LLM_JUDGE  = "llm_judge"   # Use LLM to score reasoning quality
    EXECUTION  = "execution"   # Run code, score by test results
    HEURISTIC  = "heuristic"   # Domain-specific rules
    COMPOSITE  = "composite"   # Combine multiple evaluators

def create_evaluator(config: EvaluatorConfig):
    if config.type == EvaluatorType.LLM_JUDGE:
        return lambda path: llm_score(path, config.llm_model)
    elif config.type == EvaluatorType.EXECUTION:
        return lambda path: run_tests(extract_code(path), config.test_suite)
    elif config.type == EvaluatorType.COMPOSITE:
        evaluators = [create_evaluator(sub) for sub in config.sub_evaluators]
        return lambda path: sum(
            w * e(path) for e, w in zip(evaluators, config.weights.values())
        )

Cost Monitoring Is Non-Negotiable

A depth-5 tree with beam width 3 generates up to 3⁵ = 243 reasoning paths. At $0.01 per path, that's $2.43 per query before you've executed a single action. Build cost tracking in from the start:

Python · Search Cost Tracking@dataclass
class SearchMetrics:
    nodes_expanded:     int   = 0
    llm_calls:          int   = 0
    total_tokens:       int   = 0
    estimated_cost:     float = 0.0
    wall_clock_seconds: float = 0.0

    def add_llm_call(self, input_tokens: int, output_tokens: int, model: str):
        self.llm_calls    += 1
        self.total_tokens += input_tokens + output_tokens
        self.estimated_cost += estimate_cost(input_tokens, output_tokens, model)
💡
Cursor's hybrid approach: a planner agent decomposes goals into session-sized chunks; executor agents make incremental progress. Short tasks get ReAct. Multi-session tasks get Plan-and-Execute with replanning triggers. The key insight: match planning algorithm to task horizon.
Key Points
  • Algorithm selection should vary by task — style checks get single-pass, architectural decisions get tree search
  • Evaluator quality determines search efficiency — bad evaluators waste compute on dead-end paths
  • Build cost tracking with hard limits into every search implementation to prevent budget blowouts

Why Engineers Get This Wrong

Mistake 1: Using Tree Search for Everything

⚠️
After seeing ToT's impressive benchmark results, engineers apply it everywhere. A customer service agent doesn't need to explore 5 phrasings of "your order shipped." The result: agents that are accurate but unusably slow, or that burn through API budgets in days.
Profile first. Run your task set with single-pass, measure quality. Add CoT, measure again. Only escalate to tree search if you have measurable quality gaps that justify the compute.

Mistake 2: Evaluators That Mirror the Generator

⚠️
Using the same model and similar prompts for both generation and evaluation creates a feedback loop — the model approves its own outputs. You get confident wrong answers instead of correct ones. Tree search that costs more but doesn't improve accuracy: the worst of both worlds.
Use orthogonal evaluation signals. For code generation, evaluate by execution results. For factual reasoning, evaluate against retrieved sources. If you must use LLM evaluation, use a different model or significantly different prompt framing.

Mistake 3: Ignoring Latency-to-First-Token

⚠️
Plan-and-Execute architectures delay all output until planning completes. For complex tasks, users stare at a loading spinner for 30+ seconds. Engineers who test with scripts don't feel this pain — then users complain the agent "hangs" and close the tab.
Stream intermediate state. Show the plan as it's being generated. For tree search, show the best current answer while continuing to search in the background. Agentic Plan Caching reduces latency by 27% on repeated similar tasks through template reuse.
Key Points
  • Tree search isn't universally better — profile single-pass and CoT before escalating to expensive algorithms
  • Evaluators using the same model as the generator create feedback loops that amplify confident wrong answers
  • Plan-and-Execute must stream intermediate state or users abandon agents during long planning phases

Building an Adaptive Planning Algorithm Selector

The right planning algorithm depends on your task — not a single upfront architectural decision. Here's a complete implementation that selects strategy based on measured task characteristics.

Step 1: Define Task Characteristics

Python · Task Profileclass TaskComplexity(Enum):
    SIMPLE    = "simple"    # Direct answer, no reasoning needed
    MODERATE  = "moderate"  # Some reasoning, clear path
    COMPLEX   = "complex"   # Multiple valid approaches
    AMBIGUOUS = "ambiguous" # Solution path unclear

class LatencyRequirement(Enum):
    REALTIME    = "realtime"    # < 2 seconds
    INTERACTIVE = "interactive" # < 10 seconds
    BACKGROUND  = "background"  # < 60 seconds
    BATCH       = "batch"       # No limit

@dataclass
class TaskProfile:
    complexity:       TaskComplexity
    latency_req:      LatencyRequirement
    has_verification: bool    # Can we verify correctness programmatically?
    budget_per_query: float   # Max $ to spend on this query

Step 2: Implement the Algorithm Selector

Python · Algorithm Selectordef select_planning_algorithm(profile: TaskProfile) -> PlanningConfig:
    # Realtime always gets single-pass or CoT
    if profile.latency_req == LatencyRequirement.REALTIME:
        if profile.complexity in (SIMPLE, MODERATE):
            return PlanningConfig(algorithm="single_pass")
        return PlanningConfig(algorithm="cot")

    # Complex tasks with verification → tree search or MCTS
    if profile.complexity in (COMPLEX, AMBIGUOUS):
        if profile.has_verification and profile.latency_req == BATCH:
            return PlanningConfig(
                algorithm="mcts",
                rollouts=min(100, int(profile.budget_per_query / 0.02)),
                evaluator_type="execution"
            )
        if profile.latency_req == INTERACTIVE:
            return PlanningConfig(
                algorithm="tree_search",
                beam_width=2, max_depth=3,
                evaluator_type="execution" if profile.has_verification else "llm_judge"
            )

    return PlanningConfig(algorithm="cot")

Step 3: Add Cost Guards with Graceful Degradation

Python · Budget Guardclass CostGuardedPlanner(AdaptivePlanner):
    def _tree_search(self, task: str, config: PlanningConfig) -> PlanResult:
        def guarded_generator(prompt):
            if self.metrics.estimated_cost > self.max_cost:
                raise BudgetExceeded(
                    f"Exceeded budget: ${self.metrics.estimated_cost:.2f}"
                )
            return self.llm.generate_k(prompt, k=config.beam_width)

        tree = TreeOfThought(generator=guarded_generator, ...)
        try:
            result = tree.search(task)
            return PlanResult(answer=result, confidence=0.9)
        except BudgetExceeded:
            # Return best partial result — never fail silent
            return PlanResult(answer=tree.best_so_far(), confidence=0.6, truncated=True)
Adaptive Planning Algorithm Selection
Realtime
Simple
Complex
Interactive+
Simple
Moderate
Complex
Yes
No
Batch
Task Arrives
Analyze Task
Profile
Latency
Requirement?
Complexity?
Single Pass
Chain of
Thought
Complexity?
CoT + Self
Consistency
Has
Verification?
Tree Search
(Execution Eval)
Tree Search
(LLM Judge)
MCTS
(Full Budget)
Task characteristics — latency requirement, complexity, and verification capability — drive algorithm selection automatically.

Before vs. After: Adaptive vs. One-Size-Fits-All

Aspect❌ One-Size-Fits-All✅ Adaptive Selection
Simple queries 10s latency, $0.50 (tree search overkill) 1s latency, $0.02 (single-pass)
Complex reasoning Shallow mistakes (single-pass insufficient) Thorough exploration (tree search applied)
Budget management Unpredictable costs, occasional blowouts Hard limits, graceful degradation
Observability "It took 30 seconds" Per-algorithm metrics, cost breakdown
Key Points
  • Task profiling — complexity, latency, verification capability, budget — determines optimal algorithm selection
  • Cost guards with graceful degradation prevent budget blowouts while still returning best-effort results
  • Adaptive selection can reduce costs by 25× on simple queries while improving quality on complex ones

Key Takeaways

01
Planning algorithms trade compute for quality: single-pass is fast and cheap; tree search and MCTS find solutions that single-pass misses.
02
Chain-of-Thought forces step-by-step reasoning. Tree of Thought extends this with multi-path exploration and backtracking.
03
MCTS achieves dramatic improvements (72% → 98% success) by focusing compute on promising paths through its four-phase cycle.
04
ReAct is agile but loses long-horizon goals. Plan-and-Execute maintains strategy but risks stale plans — hybrid escalation often wins.
05
Evaluator quality determines search efficiency. Use orthogonal signals — execution results, different models — to avoid self-approval feedback loops.

Build Your AI Skills Systematically

This article is part of the AI Fluens AI agent engineering track.
Get a personalized week-by-week AI upskill plan tailored to your role.

Get Your AI Upskill Plan