Evaluating whether large language models (LLMs) can truly function as autonomous decision-makers in dynamic, real-world environments remains a fundamental challenge in AI research. While LLMs excel in static or simulated settings, their performance in live, information-rich, and rapidly evolving domains—like financial markets—is far less understood.
Enter AI-Trader: the first fully automated, live, and data-uncontaminated benchmark designed specifically to evaluate LLM agents in real-time financial decision-making across three major markets—U.S. stocks (NASDAQ 100), Chinese A-shares (SSE 5 0), and cryptocurrencies (BITWISE10). Unlike synthetic or backfilled benchmarks, AI-Trader enforces a “minimal information paradigm”: agents receive only essential context and must independently search, verify, and synthesize live market data through standardized tool calls—without any human guidance, pre-programmed rules, or access to future information.
This project isn’t just another trading simulator. It’s a rigorously controlled arena where AI agents compete under identical conditions to reveal how well autonomous reasoning translates into strategic financial action—exposing both capabilities and critical gaps in current agent architectures.
Why Real Financial Benchmarking Matters
Most LLM agent benchmarks operate in closed or static environments—like text-based games, coding challenges, or curated datasets. These settings lack the unpredictability, latency sensitivity, and risk consequences inherent in real markets. As a result, high scores in those benchmarks rarely predict real-world competence.
AI-Trader closes this gap by providing:
- Live data integration via APIs (Alpha Vantage, Tushare, Jina AI)
- Strict anti-look-ahead controls ensuring agents never see future prices or news
- Identical starting conditions (capital, tools, data feeds) for fair model comparison
- Multi-market coverage to test cross-environment robustness
This allows researchers to ask and answer concrete questions: Does a model that excels in reasoning also manage risk well? Can agents adapt strategies between volatile crypto markets and policy-driven A-shares? Early findings from the AI-Trader team show that general intelligence does not automatically translate to profitable or robust trading—highlighting the need for specialized evaluation frameworks like this one.
Core Capabilities That Set AI-Trader Apart
Fully Autonomous, Zero-Human-Intervention Design
Every AI agent in AI-Trader operates with complete independence. There is no human-in-the-loop, no manual override, and no hand-coded trading logic. Agents must:
- Decide what to trade, when to trade, and how much
- Use built-in tools to fetch prices, search news, and execute trades
- Log their full reasoning chain for transparency and auditability
This mirrors real-world deployment scenarios where reliability and self-sufficiency are non-negotiable.
Multi-Market, Multi-Granularity Trading Environment
AI-Trader supports three distinct financial ecosystems:
- U.S. Stocks: NASDAQ 100 components, $10,000 starting capital, daily or hourly trading
- A-Shares: SSE 50 constituents, ¥100,000 capital, respecting T+1 settlement and lot-size rules (100-share minimums)
- Cryptocurrencies: Top 10 digital assets (BTC, ETH, etc.), 50,000 USDT capital, tradable 24/7
Hourly trading modes (for both U.S. and A-share markets) enable fine-grained strategy testing, moving beyond simplistic daily close-to-close assumptions.
Transparent Reasoning & Live Performance Dashboard
Unlike black-box trading systems, AI-Trader records every step of an agent’s decision process. Users can inspect:
- The sequence of tool calls (e.g., “search for Apple earnings,” “check Microsoft price,” “calculate portfolio risk”)
- The final trade action and its justification
- Real-time profit/loss, positions, and risk metrics
A live dashboard at ai4trade.ai visualizes ongoing competitions, making it easy to compare agent performance across models and markets.
Scientific Replay with Historical Fidelity
Thanks to its historical replay architecture, any trading period can be re-simulated with perfect consistency:
- Data access is strictly limited to what was available at or before the simulation timestamp
- News, financial reports, and price data are chronologically filtered
- Results are fully reproducible—a rarity in live-agent evaluation
This makes AI-Trader suitable not just for competition, but for rigorous academic research on agent behavior, market efficiency, and adaptive learning.
Real Problems AI-Trader Solves
For AI Researchers
- Lack of live, uncontaminated benchmarks: Most datasets leak future information or use stale snapshots. AI-Trader’s anti-look-ahead design ensures temporal integrity.
- Opaque agent behavior: The platform logs full reasoning traces, enabling analysis of how agents fail (e.g., overconfidence, poor risk assessment) rather than just that they fail.
For Fintech Prototypers & Strategy Developers
- Inconsistent cross-market testing: Previously, testing a strategy across U.S., Chinese, and crypto markets required building three separate systems. AI-Trader unifies them under one extensible framework.
- No standardized tool interface: The built-in MCP (Model Context Protocol) toolchain provides a consistent API for price lookup, trading, search, and math—regardless of asset class.
For Educators & Students
- Hands-on agent training: Learners can deploy their own agents, observe failures in real market contexts, and iterate—turning abstract LLM concepts into tangible financial decision experiments.
Who Should Use AI-Trader—and When
AI-Trader is ideal for:
- AI/ML researchers studying autonomous agent behavior in dynamic environments
- Fintech teams prototyping LLM-driven investment assistants or risk monitors
- Quantitative educators demonstrating the gap between theoretical reasoning and real-world execution
- Open-source contributors looking to submit novel agent strategies for community evaluation
Important: AI-Trader is not a production trading system. It does not connect to live brokerage accounts. All trades are simulated for research purposes only. The project includes a clear disclaimer: results must not be interpreted as investment advice.
Getting Started: A Practical Walkthrough
Setting up AI-Trader is streamlined through modular scripts and clear configuration:
Step 1: Installation & Configuration
git clone https://github.com/HKUDS/AI-Trader.git cd AI-Trader pip install -r requirements.txt cp .env.example .env
Then populate .env with API keys for:
- OpenAI (or compatible LLM provider)
- Alpha Vantage (U.S. stocks & crypto data)
- Jina AI (market news search)
- Tushare (optional, for A-share data)
Step 2: Run with One-Click Scripts
For U.S. stocks:
bash scripts/main.sh
For A-shares:
bash scripts/main_a_stock_step1.sh # Data bash scripts/main_a_stock_step2.sh # MCP services bash scripts/main_a_stock_step3.sh # Agent
For crypto:
bash scripts/main_crypto_step1.sh bash scripts/main_crypto_step2.sh bash scripts/main_crypto_step3.sh
Step 3: Extend with Custom Agents
To add your own strategy:
- Create
agent/custom/your_agent.pyinheriting fromBaseAgent - Register it in
AGENT_REGISTRYinmain.py - Add a config file in
configs/ - Submit a PR—the AI-Trader team will run it live and publish results
The architecture cleanly separates U.S., A-share, and crypto logic, making customization safe and scalable.
Key Limitations and Considerations
While powerful, AI-Trader has important constraints:
- API dependencies: Performance relies on external services (Alpha Vantage, Tushare, etc.), which may have rate limits or costs.
- No live execution: All trades are simulated; the system cannot place real orders.
- Model-dependent results: Agent performance varies significantly across LLMs—what works for Claude may fail for GPT or Qwen.
- Research-only scope: The project explicitly discourages using outputs for real investment decisions.
Users should treat AI-Trader as a scientific instrument, not a trading product.
How AI-Trader Enables Future Innovation
AI-Trader is built for community collaboration:
- Open-source code and monthly runtime data on Hugging Face
- Extensible MCP toolchain—new tools (e.g., sentiment analysis, volatility forecasts) can be added as MCP-compatible services
- Strategy marketplace vision—future updates will support a public repository of third-party agents
By standardizing evaluation in one of the most demanding real-world domains—finance—AI-Trader creates a shared foundation for advancing truly autonomous, risk-aware AI agents.
Summary
AI-Trader redefines how we evaluate LLM agents in live, high-stakes environments. By enforcing zero human intervention, strict temporal integrity, and multi-market realism, it reveals the true gap between general reasoning and strategic financial competence. For researchers, developers, and educators seeking a rigorous, transparent, and extensible platform to test autonomous agents under real-world pressure, AI-Trader offers an unprecedented—and open—arena for discovery.