AI-Scientist-v2: Automate End-to-End Scientific Discovery with Agentic Tree Search

Paper & Code

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

2025 • SakanaAI/AI-Scientist-v2

★1866

In an era where AI is reshaping how knowledge is created, AI-Scientist-v2 emerges as a breakthrough system that autonomously conducts full-cycle scientific research—from idea generation to peer-reviewed paper submission. Developed by Sakana AI, it is the first AI system to produce a workshop paper entirely on its own that successfully passed human peer review at an ICLR 2025 workshop.

For technical decision-makers, research leads, and engineering teams exploring automation in early-stage R&D, AI-Scientist-v2 offers a practical path to scale exploratory science without manual scaffolding. Unlike prior approaches that depend on human-written code templates, this version operates with true autonomy across diverse machine learning domains, making it a compelling tool for rapid hypothesis testing and proof-of-concept generation.

Core Innovations and Differentiators

Beyond Templates: Truly Autonomous Research Workflows

AI-Scientist-v2 eliminates the reliance on pre-defined human-authored code templates that constrained its predecessor (v1). Instead, it starts from a high-level research topic and builds experiments from scratch using LLM-generated code, enabling broader exploration in open-ended scientific problems.

While v1 excels in structured tasks with clear objectives (e.g., benchmarking known methods), v2 is purpose-built for discovery-driven scenarios where the solution path is unknown. This trade-off results in lower success rates but unlocks creative potential—ideal for teams seeking novel angles in crowded research areas.

Progressive Agentic Tree Search

At the heart of AI-Scientist-v2 is a novel progressive agentic tree search mechanism, coordinated by a dedicated experiment manager agent. This approach:

Explores multiple experimental paths in parallel (controlled via num_workers).
Dynamically debugs failed experiments up to a configurable depth (max_debug_depth).
Grows independent "trees" of research trajectories (num_drafts) to increase the chance of viable outcomes.

The search process is guided by iterative reflection and pruning, mimicking how human researchers pivot when initial hypotheses fail. The result is a more robust exploration of the scientific solution space without human intervention.

Integrated Scientific Writing and Visual Refinement

Once experiments conclude, the system autonomously:

Analyzes results and generates publication-ready visualizations.
Writes a complete scientific manuscript (including abstract, methodology, results, and discussion).
Refines figures and narrative using a Vision-Language Model (VLM) feedback loop during internal review.

This end-to-end capability ensures that even complex findings are communicated clearly—critical for internal reports or submission-ready drafts.

Practical Use Cases for Technical Teams

AI-Scientist-v2 is not a replacement for domain experts but a force multiplier for them. Key applications include:

Rapid prototyping of ML research directions: Test dozens of ideas in days rather than months.
Baseline study generation: Automatically produce initial analyses for new datasets or architectures.
Idea expansion under resource constraints: When human bandwidth is limited, use v2 to explore fringe hypotheses that might otherwise be deprioritized.
Workshop paper drafting: Accelerate submission cycles for academic or industrial R&D teams targeting fast-turnaround venues.

Crucially, it shines in open-ended, exploratory settings—not in highly constrained engineering tasks where templates (like v1) remain more reliable.

Getting Started: A Streamlined Workflow

The system follows a two-stage pipeline:

Stage 1: Ideation from a Research Topic

Create a Markdown file (e.g., my_topic.md) with sections: Title, Keywords, TL;DR, and Abstract.

Run the ideation script:

python ai_scientist/perform_ideation_temp_free.py --workshop-file "ai_scientist/ideas/my_topic.md" --model gpt-4o-2024-05-13 --max-num-generations 20

This outputs a JSON file (my_topic.json) containing vetted, novel research ideas.

Stage 2: Full Paper Generation via Tree Search

Launch the main pipeline using the generated JSON:

python launch_scientist_bfts.py --load_ideas "ai_scientist/ideas/my_topic.json" --model_writeup o1-preview-2024-09-12 --model_review gpt-4o-2024-11-20 --model_agg_plots o3-mini-2025-01-31

Requirements:

Linux with NVIDIA GPU (CUDA + PyTorch).
API keys for LLMs (OpenAI, Gemini, or Claude via AWS Bedrock).
Optional: Semantic Scholar API key for citation novelty checks.

Runtime & Cost:

Ideation: ~(2–5.
Full experiment + writing: ~)20–25 (using Claude 3.5 Sonnet for experiments).
Total time: Several hours per run.

Security Note: Since the system executes LLM-written code, always run in a sandboxed environment (e.g., Docker) to prevent unintended system access or package installation.

Limitations and Operational Considerations

While powerful, AI-Scientist-v2 comes with important caveats:

Lower success rate: Due to its exploratory nature, not every run yields a valid paper. Success heavily depends on the LLM’s reasoning quality (Claude 3.5 Sonnet is recommended for experiments).
ML-domain focus: Generalization is strongest within machine learning; adaptation to biology, chemistry, or physics may require significant prompt engineering.
Hardware demands: May trigger CUDA out-of-memory errors on smaller GPUs. Mitigate by constraining model size in the ideation prompt.
No web safety guarantees: Autonomous code execution poses risks—never run in production or shared environments without isolation.

Summary

AI-Scientist-v2 represents a significant leap toward autonomous scientific discovery. By combining agentic tree search, template-free experimentation, and integrated writing with visual refinement, it enables technical teams to systematically explore the unknown at scale. While not a silver bullet, it offers a unique advantage for organizations seeking to accelerate early-stage research, validate unconventional ideas, or augment human creativity with AI-driven exploration. With its code publicly available and design optimized for real-world ML domains, it’s a compelling tool for forward-looking R&D teams ready to pilot the future of automated science.