AIDE ML — The Machine Learning Engineering Agent
AIDE ML is the open-source reference implementation of the AIDE algorithm, a tree-search agent that autonomously drafts, debugs, and benchmarks code until a user-defined metric is maximized (or minimized). It's designed as a research-friendly Python package, complete with a command-line interface (CLI), visualization tools, and configuration presets. This allows academics and engineer-researchers to easily replicate the findings of our paper, experiment with new ideas, or prototype complete machine-learning pipelines.
Project Layers
The AIDE ecosystem consists of three distinct layers:
Layer | Description | Where to find it |
---|---|---|
AIDE algorithm | An LLM-guided agentic tree search in the space of code. | Described in our paper. |
AIDE ML repo (this repo) | A lean implementation for experimentation & extension. | pip install aideml |
Weco product | A platform that generalizes AIDE's capabilities to broader code optimization scenarios, providing experiment tracking and enhanced user control. | weco.ai |
Who is this for?
- Agent-Architecture Researchers: Swap in new search heuristics, evaluators, or LLM back-ends to test novel agent designs.
- ML Practitioners: Quickly build a high-performance ML pipeline for a given dataset.
Key Capabilities
- Natural Language Task Specification: Simply point the agent at a dataset and describe your goal and metric in plain English. There's no need for complex YAML grids or custom wrappers.
bash aide data_dir=… goal="Predict churn" eval="AUROC"
- Iterative Agentic Tree Search: Each generated Python script becomes a node in a solution tree. LLM-generated patches spawn child nodes, and metric feedback from code execution is used to prune and guide the search process. This iterative approach allows the agent to build on previous successes and learn from its mistakes.
Featured Research
AIDE ML has been used and cited in research by several leading institutions:
Institution | Paper / Project Name | Links |
---|---|---|
OpenAI | MLE-bench: Evaluating Machine-Learning Agents on Machine-Learning Engineering | Paper, GitHub |
METR | RE-Bench: Evaluating frontier AI R&D capabilities of language-model agents against human experts | Paper, GitHub |
Sakana AI | The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search | Paper, GitHub |
Meta | The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements | Paper, GitHub |
Meta | AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench | Paper, GitHub |
SJTU | ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning | Paper, GitHub |
If you know of another public project that cites or forks AIDE, please open a pull request to add it to the table!