Core Concept: Agentic Tree Search

The fundamental algorithm driving AIDE ML is an agentic tree search. Instead of generating a single solution, the agent explores a tree of possible code solutions, iteratively refining and debugging its approach based on empirical feedback.

Tree Search Visualization

This concept is primarily implemented in aide/agent.py and aide/journal.py.

The Solution Tree

The entire history of the agent's work is stored in a data structure called the Journal, which represents a tree. Each node in this tree is a Node object, which contains:

  • plan: A natural language description of the intended solution or improvement.
  • code: The full Python script generated by the LLM.
  • Execution Results: The terminal output (term_out), execution time, and any exceptions that occurred when the code was run in the sandboxed interpreter.
  • analysis: An LLM-generated review of the execution output, summarizing the findings or diagnosing a bug.
  • metric: The validation metric extracted from the output. This is a crucial feedback signal.
  • is_buggy: A boolean flag indicating whether the code failed to run, produced an error, or failed to report a valid metric.
  • Parent/Children: Pointers that structure the nodes into a tree, showing how one solution was derived from another.

The Agent's Actions

At each step, the agent performs one of three actions to create a new node in the tree:

  1. Draft (_draft): If there are not enough initial solutions, the agent creates a new, simple solution from scratch based on the initial task description. These draft nodes are the roots of the tree.

  2. Improve (_improve): The agent selects a promising, non-buggy node from the tree (usually the one with the best metric so far) and attempts to improve it. It is prompted with the parent node's code and analysis and asked to propose and implement a single, atomic improvement (e.g., trying a different model, adding a new feature, tuning a hyperparameter).

  3. Debug (_debug): If a node is flagged as buggy, the agent can choose to debug it. It is prompted with the buggy code and the full terminal output (including the traceback) and asked to generate a fix. This creates a child node on a "debug" path.

Search Policy

How does the agent decide what to do next? The search_policy method in agent.py governs this decision:

  1. Initial Drafting: The agent first ensures a minimum number of root drafts are created (num_drafts in the configuration).
  2. Debugging vs. Improving: After drafting, the agent probabilistically decides whether to work on fixing a bug or improving a successful solution. The debug_prob configuration parameter controls this likelihood.
  3. Node Selection for Improvement: When improving, the default policy is greedy. The agent selects the best-performing non-buggy node from the entire journal as the parent for the next improvement step.

This combination of drafting, iterative improvement, debugging, and a guiding search policy allows the agent to systematically explore the solution space, recover from errors, and build upon its successes to find high-performing code.