R&D-Agent: Automate End-to-End AI Development with a Dual-Agent Framework That Tops MLE-Bench

Paper & Code

R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution

2025 • microsoft/RD-Agent

★9745

Building high-performing, data-driven AI solutions remains a labor-intensive, iterative process—even for seasoned machine learning engineers. From brainstorming novel modeling ideas to debugging pipelines and tuning features, much of the work demands deep expertise and countless hours of trial and error. Enter R&D-Agent, an open-source, dual-agent framework developed by Microsoft that automates the full cycle of AI research and development using large language models (LLMs).

R&D-Agent uniquely combines two complementary roles: a Researcher agent that proposes hypotheses and new ideas based on performance feedback, and a Developer agent that translates those ideas into executable code and refines it using error signals. This closed-loop, feedback-driven architecture enables continuous, autonomous improvement—mimicking how human experts iterate in real-world R&D.

Critically, R&D-Agent isn’t a lab curiosity. It’s the top-performing machine learning engineering agent on MLE-Bench, a rigorous benchmark based on 75 real-world Kaggle competitions spanning low-, medium-, and high-complexity tasks. By automating both ideation and implementation, R&D-Agent bridges the gap between raw data and production-ready AI solutions—reducing manual effort while boosting solution quality.

How R&D-Agent Works: Dual Agents, Real-World Feedback

At its core, R&D-Agent operationalizes the scientific method for AI development. The Researcher formulates hypotheses—such as “adding temporal convolution may improve time-series forecasting” or “a new factor derived from financial reports could enhance alpha generation.” The Developer then implements these ideas in code, runs experiments, and collects metrics.

Crucially, both agents learn from feedback:

If a model underperforms, the Researcher refines or replaces the hypothesis.
If code fails or produces suboptimal results, the Developer iterates on the implementation.

This loop isn’t linear. R&D-Agent supports multiple parallel exploration traces, allowing diverse ideas to evolve simultaneously. Promising paths can merge, share insights, and accelerate convergence toward high-quality solutions—something single-threaded automation tools cannot achieve.

Proven Performance on Real-World Benchmarks

R&D-Agent’s effectiveness is validated on MLE-Bench, a benchmark that evaluates AI agents on practical ML engineering tasks derived from Kaggle. As of its latest evaluation, R&D-Agent leads all public agents:

51.5% success rate on “Lite” (low-complexity) tasks
19.3% on Medium tasks (2–10 hours of expert effort)
26.7% on High tasks (>10 hours of expert effort)

These results significantly outperform prior state-of-the-art systems like AIDE, demonstrating R&D-Agent’s ability to handle not just simple automations but genuinely complex, open-ended engineering challenges.

Practical Use Cases: From Finance to Kaggle

R&D-Agent is designed for real-world applicability, with pre-built scenarios across multiple domains:

Automated Quantitative Trading (R&D-Agent-Quant)

In finance, R&D-Agent automates the co-optimization of predictive factors and trading models. In live market tests, it achieved 2× higher annualized return than standard factor libraries—using 70% fewer factors and costing under $10 in compute. This addresses a key pain point: the manual, intuition-driven process of factor discovery and model tuning.

Research Paper & Financial Report Implementation

Feed R&D-Agent a paper (e.g., an arXiv link) or a folder of financial reports, and it extracts key models or features and implements them as runnable code. This turns dense technical documents into executable prototypes—without weeks of manual coding.

End-to-End Kaggle Competition Participation

R&D-Agent autonomously handles both feature engineering and model tuning for Kaggle competitions. It downloads data, iterates on pipeline design, and submits predictions—ideal for teams looking to accelerate experimentation or individuals learning competitive ML.

Medical Prediction Model Development

In healthcare, it iteratively proposes and implements models for clinical prediction tasks (e.g., acute kidney injury risk), adapting to domain-specific constraints and data structures.

Getting Started Is Simple (If You Meet the Requirements)

R&D-Agent prioritizes usability without sacrificing flexibility:

Install via PyPI (pip install rdagent) or from source.
Configure your LLM and embedding models in a .env file—using LiteLLM as the default backend, which supports OpenAI, Azure OpenAI, DeepSeek, and more.
Run a scenario with a single CLI command, like rdagent fin_quant for quant trading or rdagent data_science --competition <name> for Kaggle.
Monitor progress through a built-in web UI (rdagent ui --port 19899).

The system abstracts away much of the orchestration complexity, letting users focus on high-level goals rather than infrastructure.

Key Limitations to Keep in Mind

While powerful, R&D-Agent has clear boundaries:

Linux-only: No support for Windows or macOS.
Docker required: Needed for environment isolation; users must run Docker without sudo.
LLM dependency: Requires access to capable models (e.g., GPT-4o, o1-preview, or DeepSeek).
Scenario-specific setup: Kaggle integration, for example, needs a valid kaggle.json token.
Human oversight recommended: Especially in high-stakes domains like finance or medicine, results should be validated before deployment.

Summary

R&D-Agent represents a significant leap toward autonomous, data-driven R&D. By unifying idea generation and code implementation in a feedback-rich, multi-trace framework, it reduces reliance on scarce ML expertise while delivering expert-level results on real-world benchmarks. Whether you’re building trading strategies, competing on Kaggle, or turning research papers into models, R&D-Agent offers a battle-tested, open-source foundation to accelerate your AI development lifecycle—without reinventing the wheel.

For teams and individuals looking to scale ML innovation while cutting manual toil, R&D-Agent isn’t just another agent framework—it’s a proven engine for AI-powered engineering.