Planning Algorithms and Strategies for AI Agents

AI Agents · March 10, 2026 · 15 min read

Planning Algorithms and Strategies for AI Agents

Planning algorithms determine how your agent explores solution paths when the answer isn't obvious. Chain-of-Thought, Tree of Thought, MCTS, ReAct, Plan-and-Execute — the key engineering challenge is balancing thoroughness against cost. More exploration means higher quality but exponential latency and token spend.

Core Concepts

From Single-Step to Multi-Path Reasoning

Planning algorithms give agents the ability to reason about multiple possible paths before committing to action. Without them, agents make shallow decisions that fail on problems requiring deliberation.

Chain-of-Thought: Structured Step-by-Step Reasoning

Chain-of-Thought (CoT) prompting forces the model to externalize intermediate reasoning steps before arriving at an answer. Instead of jumping directly from question to answer, the model shows its work.

Approach	Prompt Pattern	When to Use
Zero-shot CoT	"Let's think step by step"	Quick reasoning boost, no examples needed
Few-shot CoT	Provide examples with explicit reasoning chains	Domain-specific reasoning patterns
Self-consistency CoT	Generate multiple reasoning paths, vote on final answer	High-stakes decisions where you can afford extra compute

💡

CoT gains are highly sensitive to how well prompt examples match the problem structure. Generalization beyond narrow contexts remains elusive — don't assume a CoT prompt that works on one task transfers to another.

Tree of Thought: Multi-Path Exploration

Tree of Thought (ToT) extends CoT by exploring multiple reasoning branches in parallel, evaluating intermediate states, and backtracking when paths fail.

Python · Tree of Thought@dataclass
class ThoughtNode:
    content: str
    parent: 'ThoughtNode | None'
    score: float = 0.0
    children: list['ThoughtNode'] = None

class TreeOfThought:
    def __init__(self, generator, evaluator, max_depth=5, beam_width=3):
        self.generator = generator    # Generate k next thoughts
        self.evaluator = evaluator    # Score a thought path
        self.max_depth = max_depth
        self.beam_width = beam_width

    def search(self, problem: str) -> str:
        root = ThoughtNode(content=problem, parent=None)
        frontier = [root]

        for depth in range(self.max_depth):
            candidates = []
            for node in frontier:
                next_thoughts = self.generator(self._path_to_prompt(node))
                for thought in next_thoughts:
                    child = ThoughtNode(content=thought, parent=node)
                    child.score = self.evaluator(self._path_to_prompt(child))
                    node.children.append(child)
                    candidates.append(child)

            # Keep top-k by score (beam search)
            candidates.sort(key=lambda n: n.score, reverse=True)
            frontier = candidates[:self.beam_width]

            # Early termination if best path is confident
            if frontier and frontier[0].score > 0.95:
                break

        return self._path_to_prompt(frontier[0]) if frontier else ""

Tree of Thought Search

Low-scoring branches (red) are pruned. The highest-scoring path (green) is expanded toward the solution.

Monte Carlo Tree Search for LLM Reasoning

MCTS brings game-playing algorithms to LLM reasoning. Instead of evaluating every possible path, it uses random rollouts to estimate which branches are most promising. The cycle runs four phases repeatedly:

Phase 1

Selection

Navigate from root to a leaf using UCB to balance exploration vs. exploitation

Phase 2

Expansion

Add new child nodes representing possible next steps from the selected leaf

Phase 3

Simulation

Roll out random completions to estimate the value of this path

Phase 4

Backpropagation

Update scores along the entire path based on simulation results

72%

Single-pass success on optimization problems

98%

MCTS success rate on the same problems

74.4%

o1 score on IMO qualifying exams

9.3%

GPT-4o score on the same exams

MCTS Four-Phase Cycle

MCTS focuses compute on promising branches through repeated cycles of selection, expansion, simulation, and backpropagation.

ReAct vs. Plan-and-Execute

Two dominant paradigms structure how agents plan and act:

⚡ ReAct

Interleaves reasoning and action in a tight loop: Think → Act → Observe → Think
Agile, adapts quickly to new observations
Lower latency per decision
Can get stuck in loops, loses sight of long-horizon goals

🗺 Plan-and-Execute

Generates a complete plan upfront, then executes step by step
Better for complex multi-step tasks with clear dependencies
Easier to debug, clearer accountability
Higher latency to first action; plan may go stale

ReAct vs. Plan-and-Execute

ReAct interleaves thinking and acting in rapid cycles. Plan-and-Execute separates strategy (planner) from implementation (executor).

Test-Time Compute Scaling

More thinking time generally means better answers. The question is: how much is worth it?

Strategy	Compute Multiplier	When to Use
Single-pass	1×	Simple tasks, latency-critical
Self-consistency (5 samples)	5×	Moderate complexity, need confidence
Tree search (beam=3, depth=5)	~15×	Complex reasoning, quality critical
MCTS (100+ rollouts)	50–100×	Optimization problems, decision under uncertainty

💡

The 2025 research on Agentic Plan Caching shows a practical cost mitigation: reuse plan templates across similar tasks, reducing serving costs by 50% and latency by 27% while maintaining 96% of optimal performance.

Key Points

Chain-of-Thought forces step-by-step reasoning but gains are sensitive to prompt example quality
Tree of Thought explores multiple paths with evaluation and backtracking — 10–50× more expensive than single-pass
MCTS improves success rates from 72% to 98% on optimization problems by focusing compute on promising paths
ReAct is agile but loses long-horizon goals; Plan-and-Execute maintains strategy at the cost of latency to first action
o1 is 6× more expensive than GPT-4o but scores 74% vs. 9% on math olympiad problems — the tradeoff is explicit

Applied Engineering

Implementation Decisions That Shape Your Agent

As a software engineer building agentic systems, you're writing the code that decides how much your agent thinks before acting. Get this wrong and you either ship an agent that makes shallow mistakes, or one that burns through your API budget on problems that didn't need deep reasoning.

Evaluator Design Patterns

Your evaluator function determines which reasoning paths survive in tree search and MCTS. A bad evaluator means the agent spends compute exploring dead ends.

Python · Evaluator Factoryclass EvaluatorType(Enum):
    LLM_JUDGE  = "llm_judge"   # Use LLM to score reasoning quality
    EXECUTION  = "execution"   # Run code, score by test results
    HEURISTIC  = "heuristic"   # Domain-specific rules
    COMPOSITE  = "composite"   # Combine multiple evaluators

def create_evaluator(config: EvaluatorConfig):
    if config.type == EvaluatorType.LLM_JUDGE:
        return lambda path: llm_score(path, config.llm_model)
    elif config.type == EvaluatorType.EXECUTION:
        return lambda path: run_tests(extract_code(path), config.test_suite)
    elif config.type == EvaluatorType.COMPOSITE:
        evaluators = [create_evaluator(sub) for sub in config.sub_evaluators]
        return lambda path: sum(
            w * e(path) for e, w in zip(evaluators, config.weights.values())
        )

Cost Monitoring Is Non-Negotiable

A depth-5 tree with beam width 3 generates up to 3⁵ = 243 reasoning paths. At $0.01 per path, that's $2.43 per query before you've executed a single action. Build cost tracking in from the start:

Python · Search Cost Tracking@dataclass
class SearchMetrics:
    nodes_expanded:     int   = 0
    llm_calls:          int   = 0
    total_tokens:       int   = 0
    estimated_cost:     float = 0.0
    wall_clock_seconds: float = 0.0

    def add_llm_call(self, input_tokens: int, output_tokens: int, model: str):
        self.llm_calls    += 1
        self.total_tokens += input_tokens + output_tokens
        self.estimated_cost += estimate_cost(input_tokens, output_tokens, model)

💡

Cursor's hybrid approach: a planner agent decomposes goals into session-sized chunks; executor agents make incremental progress. Short tasks get ReAct. Multi-session tasks get Plan-and-Execute with replanning triggers. The key insight: match planning algorithm to task horizon.

Key Points

Algorithm selection should vary by task — style checks get single-pass, architectural decisions get tree search
Evaluator quality determines search efficiency — bad evaluators waste compute on dead-end paths
Build cost tracking with hard limits into every search implementation to prevent budget blowouts

Common Pitfalls

Why Engineers Get This Wrong

Mistake 1: Using Tree Search for Everything

⚠️

After seeing ToT's impressive benchmark results, engineers apply it everywhere. A customer service agent doesn't need to explore 5 phrasings of "your order shipped." The result: agents that are accurate but unusably slow, or that burn through API budgets in days.

✅

Profile first. Run your task set with single-pass, measure quality. Add CoT, measure again. Only escalate to tree search if you have measurable quality gaps that justify the compute.

Mistake 2: Evaluators That Mirror the Generator

⚠️

Using the same model and similar prompts for both generation and evaluation creates a feedback loop — the model approves its own outputs. You get confident wrong answers instead of correct ones. Tree search that costs more but doesn't improve accuracy: the worst of both worlds.

✅

Use orthogonal evaluation signals. For code generation, evaluate by execution results. For factual reasoning, evaluate against retrieved sources. If you must use LLM evaluation, use a different model or significantly different prompt framing.

Mistake 3: Ignoring Latency-to-First-Token

⚠️

Plan-and-Execute architectures delay all output until planning completes. For complex tasks, users stare at a loading spinner for 30+ seconds. Engineers who test with scripts don't feel this pain — then users complain the agent "hangs" and close the tab.

✅

Stream intermediate state. Show the plan as it's being generated. For tree search, show the best current answer while continuing to search in the background. Agentic Plan Caching reduces latency by 27% on repeated similar tasks through template reuse.

Key Points

Tree search isn't universally better — profile single-pass and CoT before escalating to expensive algorithms
Evaluators using the same model as the generator create feedback loops that amplify confident wrong answers
Plan-and-Execute must stream intermediate state or users abandon agents during long planning phases

Practical Application

Building an Adaptive Planning Algorithm Selector

The right planning algorithm depends on your task — not a single upfront architectural decision. Here's a complete implementation that selects strategy based on measured task characteristics.

Step 1: Define Task Characteristics

Python · Task Profileclass TaskComplexity(Enum):
    SIMPLE    = "simple"    # Direct answer, no reasoning needed
    MODERATE  = "moderate"  # Some reasoning, clear path
    COMPLEX   = "complex"   # Multiple valid approaches
    AMBIGUOUS = "ambiguous" # Solution path unclear

class LatencyRequirement(Enum):
    REALTIME    = "realtime"    # < 2 seconds
    INTERACTIVE = "interactive" # < 10 seconds
    BACKGROUND  = "background"  # < 60 seconds
    BATCH       = "batch"       # No limit

@dataclass
class TaskProfile:
    complexity:       TaskComplexity
    latency_req:      LatencyRequirement
    has_verification: bool    # Can we verify correctness programmatically?
    budget_per_query: float   # Max $ to spend on this query

Step 2: Implement the Algorithm Selector

Python · Algorithm Selectordef select_planning_algorithm(profile: TaskProfile) -> PlanningConfig:
    # Realtime always gets single-pass or CoT
    if profile.latency_req == LatencyRequirement.REALTIME:
        if profile.complexity in (SIMPLE, MODERATE):
            return PlanningConfig(algorithm="single_pass")
        return PlanningConfig(algorithm="cot")

    # Complex tasks with verification → tree search or MCTS
    if profile.complexity in (COMPLEX, AMBIGUOUS):
        if profile.has_verification and profile.latency_req == BATCH:
            return PlanningConfig(
                algorithm="mcts",
                rollouts=min(100, int(profile.budget_per_query / 0.02)),
                evaluator_type="execution"
            )
        if profile.latency_req == INTERACTIVE:
            return PlanningConfig(
                algorithm="tree_search",
                beam_width=2, max_depth=3,
                evaluator_type="execution" if profile.has_verification else "llm_judge"
            )

    return PlanningConfig(algorithm="cot")

Step 3: Add Cost Guards with Graceful Degradation

Python · Budget Guardclass CostGuardedPlanner(AdaptivePlanner):
    def _tree_search(self, task: str, config: PlanningConfig) -> PlanResult:
        def guarded_generator(prompt):
            if self.metrics.estimated_cost > self.max_cost:
                raise BudgetExceeded(
                    f"Exceeded budget: ${self.metrics.estimated_cost:.2f}"
                )
            return self.llm.generate_k(prompt, k=config.beam_width)

        tree = TreeOfThought(generator=guarded_generator, ...)
        try:
            result = tree.search(task)
            return PlanResult(answer=result, confidence=0.9)
        except BudgetExceeded:
            # Return best partial result — never fail silent
            return PlanResult(answer=tree.best_so_far(), confidence=0.6, truncated=True)

Adaptive Planning Algorithm Selection

Task characteristics — latency requirement, complexity, and verification capability — drive algorithm selection automatically.

Before vs. After: Adaptive vs. One-Size-Fits-All

Aspect	❌ One-Size-Fits-All	✅ Adaptive Selection
Simple queries	10s latency, $0.50 (tree search overkill)	1s latency, $0.02 (single-pass)
Complex reasoning	Shallow mistakes (single-pass insufficient)	Thorough exploration (tree search applied)
Budget management	Unpredictable costs, occasional blowouts	Hard limits, graceful degradation
Observability	"It took 30 seconds"	Per-algorithm metrics, cost breakdown

Key Points

Task profiling — complexity, latency, verification capability, budget — determines optimal algorithm selection
Cost guards with graceful degradation prevent budget blowouts while still returning best-effort results
Adaptive selection can reduce costs by 25× on simple queries while improving quality on complex ones

Summary

Key Takeaways

Planning algorithms trade compute for quality: single-pass is fast and cheap; tree search and MCTS find solutions that single-pass misses.

Chain-of-Thought forces step-by-step reasoning. Tree of Thought extends this with multi-path exploration and backtracking.

MCTS achieves dramatic improvements (72% → 98% success) by focusing compute on promising paths through its four-phase cycle.

ReAct is agile but loses long-horizon goals. Plan-and-Execute maintains strategy but risks stale plans — hybrid escalation often wins.

Evaluator quality determines search efficiency. Use orthogonal signals — execution results, different models — to avoid self-approval feedback loops.

Build Your AI Skills Systematically

This article is part of the AI Fluens AI agent engineering track.
Get a personalized week-by-week AI upskill plan tailored to your role.

Get Your AI Upskill Plan