AI-Trader: Benchmark Autonomous LLM Agents in Real Financial Markets with Zero Human Intervention

Paper & Code

AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

2025 • HKUDS/AI-Trader

★10216

Evaluating whether large language models (LLMs) can truly function as autonomous decision-makers in dynamic, real-world environments remains a fundamental challenge in AI research. While LLMs excel in static or simulated settings, their performance in live, information-rich, and rapidly evolving domains—like financial markets—is far less understood.

Enter AI-Trader: the first fully automated, live, and data-uncontaminated benchmark designed specifically to evaluate LLM agents in real-time financial decision-making across three major markets—U.S. stocks (NASDAQ 100), Chinese A-shares (SSE 5 0), and cryptocurrencies (BITWISE10). Unlike synthetic or backfilled benchmarks, AI-Trader enforces a “minimal information paradigm”: agents receive only essential context and must independently search, verify, and synthesize live market data through standardized tool calls—without any human guidance, pre-programmed rules, or access to future information.

This project isn’t just another trading simulator. It’s a rigorously controlled arena where AI agents compete under identical conditions to reveal how well autonomous reasoning translates into strategic financial action—exposing both capabilities and critical gaps in current agent architectures.

Why Real Financial Benchmarking Matters

Most LLM agent benchmarks operate in closed or static environments—like text-based games, coding challenges, or curated datasets. These settings lack the unpredictability, latency sensitivity, and risk consequences inherent in real markets. As a result, high scores in those benchmarks rarely predict real-world competence.

AI-Trader closes this gap by providing:

Live data integration via APIs (Alpha Vantage, Tushare, Jina AI)
Strict anti-look-ahead controls ensuring agents never see future prices or news
Identical starting conditions (capital, tools, data feeds) for fair model comparison
Multi-market coverage to test cross-environment robustness

This allows researchers to ask and answer concrete questions: Does a model that excels in reasoning also manage risk well? Can agents adapt strategies between volatile crypto markets and policy-driven A-shares? Early findings from the AI-Trader team show that general intelligence does not automatically translate to profitable or robust trading—highlighting the need for specialized evaluation frameworks like this one.

Core Capabilities That Set AI-Trader Apart

Fully Autonomous, Zero-Human-Intervention Design

Every AI agent in AI-Trader operates with complete independence. There is no human-in-the-loop, no manual override, and no hand-coded trading logic. Agents must:

Decide what to trade, when to trade, and how much
Use built-in tools to fetch prices, search news, and execute trades
Log their full reasoning chain for transparency and auditability

This mirrors real-world deployment scenarios where reliability and self-sufficiency are non-negotiable.

Multi-Market, Multi-Granularity Trading Environment

AI-Trader supports three distinct financial ecosystems:

U.S. Stocks: NASDAQ 100 components, $10,000 starting capital, daily or hourly trading
A-Shares: SSE 50 constituents, ¥100,000 capital, respecting T+1 settlement and lot-size rules (100-share minimums)
Cryptocurrencies: Top 10 digital assets (BTC, ETH, etc.), 50,000 USDT capital, tradable 24/7

Hourly trading modes (for both U.S. and A-share markets) enable fine-grained strategy testing, moving beyond simplistic daily close-to-close assumptions.

Transparent Reasoning & Live Performance Dashboard

Unlike black-box trading systems, AI-Trader records every step of an agent’s decision process. Users can inspect:

The sequence of tool calls (e.g., “search for Apple earnings,” “check Microsoft price,” “calculate portfolio risk”)
The final trade action and its justification
Real-time profit/loss, positions, and risk metrics

A live dashboard at ai4trade.ai visualizes ongoing competitions, making it easy to compare agent performance across models and markets.

Scientific Replay with Historical Fidelity

Thanks to its historical replay architecture, any trading period can be re-simulated with perfect consistency:

Data access is strictly limited to what was available at or before the simulation timestamp
News, financial reports, and price data are chronologically filtered
Results are fully reproducible—a rarity in live-agent evaluation

This makes AI-Trader suitable not just for competition, but for rigorous academic research on agent behavior, market efficiency, and adaptive learning.

Real Problems AI-Trader Solves

For AI Researchers

Lack of live, uncontaminated benchmarks: Most datasets leak future information or use stale snapshots. AI-Trader’s anti-look-ahead design ensures temporal integrity.
Opaque agent behavior: The platform logs full reasoning traces, enabling analysis of how agents fail (e.g., overconfidence, poor risk assessment) rather than just that they fail.

For Fintech Prototypers & Strategy Developers

Inconsistent cross-market testing: Previously, testing a strategy across U.S., Chinese, and crypto markets required building three separate systems. AI-Trader unifies them under one extensible framework.
No standardized tool interface: The built-in MCP (Model Context Protocol) toolchain provides a consistent API for price lookup, trading, search, and math—regardless of asset class.

For Educators & Students

Hands-on agent training: Learners can deploy their own agents, observe failures in real market contexts, and iterate—turning abstract LLM concepts into tangible financial decision experiments.

Who Should Use AI-Trader—and When

AI-Trader is ideal for:

AI/ML researchers studying autonomous agent behavior in dynamic environments
Fintech teams prototyping LLM-driven investment assistants or risk monitors
Quantitative educators demonstrating the gap between theoretical reasoning and real-world execution
Open-source contributors looking to submit novel agent strategies for community evaluation

Important: AI-Trader is not a production trading system. It does not connect to live brokerage accounts. All trades are simulated for research purposes only. The project includes a clear disclaimer: results must not be interpreted as investment advice.

Getting Started: A Practical Walkthrough

Setting up AI-Trader is streamlined through modular scripts and clear configuration:

Step 1: Installation & Configuration

git clone https://github.com/HKUDS/AI-Trader.git  
cd AI-Trader  
pip install -r requirements.txt  
cp .env.example .env

Then populate .env with API keys for:

OpenAI (or compatible LLM provider)
Alpha Vantage (U.S. stocks & crypto data)
Jina AI (market news search)
Tushare (optional, for A-share data)

Step 2: Run with One-Click Scripts

For U.S. stocks:

bash scripts/main.sh

For A-shares:

bash scripts/main_a_stock_step1.sh  # Data  
bash scripts/main_a_stock_step2.sh  # MCP services  
bash scripts/main_a_stock_step3.sh  # Agent

For crypto:

bash scripts/main_crypto_step1.sh  
bash scripts/main_crypto_step2.sh  
bash scripts/main_crypto_step3.sh

Step 3: Extend with Custom Agents

To add your own strategy:

Create agent/custom/your_agent.py inheriting from BaseAgent
Register it in AGENT_REGISTRY in main.py
Add a config file in configs/
Submit a PR—the AI-Trader team will run it live and publish results

The architecture cleanly separates U.S., A-share, and crypto logic, making customization safe and scalable.

Key Limitations and Considerations

While powerful, AI-Trader has important constraints:

API dependencies: Performance relies on external services (Alpha Vantage, Tushare, etc.), which may have rate limits or costs.
No live execution: All trades are simulated; the system cannot place real orders.
Model-dependent results: Agent performance varies significantly across LLMs—what works for Claude may fail for GPT or Qwen.
Research-only scope: The project explicitly discourages using outputs for real investment decisions.

Users should treat AI-Trader as a scientific instrument, not a trading product.

How AI-Trader Enables Future Innovation

AI-Trader is built for community collaboration:

Open-source code and monthly runtime data on Hugging Face
Extensible MCP toolchain—new tools (e.g., sentiment analysis, volatility forecasts) can be added as MCP-compatible services
Strategy marketplace vision—future updates will support a public repository of third-party agents

By standardizing evaluation in one of the most demanding real-world domains—finance—AI-Trader creates a shared foundation for advancing truly autonomous, risk-aware AI agents.

Summary

AI-Trader redefines how we evaluate LLM agents in live, high-stakes environments. By enforcing zero human intervention, strict temporal integrity, and multi-market realism, it reveals the true gap between general reasoning and strategic financial competence. For researchers, developers, and educators seeking a rigorous, transparent, and extensible platform to test autonomous agents under real-world pressure, AI-Trader offers an unprecedented—and open—arena for discovery.