Fin-R1: A 7B Financial Reasoning LLM That Outperforms Larger Models on Complex Finance Tasks

Paper & Code

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

2025 • SUFE-AIFLM-Lab/Fin-R1

★688

Fin-R1 is a purpose-built reasoning large language model (LLM) designed specifically for the financial domain. Despite having only 7 billion parameters—significantly smaller than many frontier models—it achieves state-of-the-art (SOTA) results on key financial reasoning benchmarks like FinQA and ConvFinQA, even surpassing models with tens of billions more parameters. Developed by the SUFE-AIFLM Lab at Shanghai University of Finance and Economics, Fin-R1 leverages a two-stage training pipeline: supervised fine-tuning (SFT) followed by reinforcement learning (RL) using Group Relative Policy Optimization (GRPO). Built on the Qwen2.5-7B-Instruct base, it demonstrates that domain specialization, high-quality reasoning data, and targeted training can yield exceptional performance without massive scale.

For project and technical decision-makers in banking, fintech, insurance, or quantitative research, Fin-R1 offers a lightweight, deployable solution that delivers expert-level financial reasoning—without the infrastructure costs of hundred-billion-parameter models.

Core Capabilities That Address Real Financial AI Challenges

High-Accuracy Reasoning in a Compact 7B Footprint

Financial tasks often require precise numerical computation, logical consistency, and regulatory awareness—areas where general-purpose LLMs frequently fail. Fin-R1 tackles these challenges head-on. On the FinQA dataset (which tests numerical reasoning over financial reports) and ConvFinQA (which evaluates multi-turn financial Q&A), Fin-R1 scores 76.0 and 85.0 respectively—topping all evaluated models, including much larger ones like DeepSeek-R1-Distill-Llama-70B (68.0 on FinQA).

This performance isn’t accidental. It stems from a deliberate design philosophy: instead of scaling up, Fin-R1 scales smart—focusing exclusively on financial reasoning fidelity through curated data and advanced training techniques.

Finance-Specific Reasoning Data Distilled from DeepSeek-R1

Fin-R1’s training data—named Fin-R1-Data—comprises ~60,000 high-quality chain-of-thought (CoT) examples distilled from DeepSeek-R1 across 10 financial sources, including FinQA, ConvFinQA, Ant_Finance, FinCorpus, and FinanceIQ. Crucially, every example undergoes a two-round quality filter:

Answer validation: Answers are scored for correctness using rule-based matching or Qwen2.5-72B-Instruct.
Reasoning validation: The CoT logic is evaluated on seven dimensions, including internal consistency, step count (≥3 steps), alignment with task instructions, and relevance to financial domains.

Only samples labeled “good” in both rounds are used for SFT; “bad” samples are repurposed for RL training—teaching the model not just what’s right, but what reasoning patterns to avoid.

Dual-Stage Training: SFT + GRPO-Based Reinforcement Learning

Fin-R1’s training unfolds in two stages:

Stage 1 (SFT): The base Qwen2.5-7B-Instruct model is fine-tuned on high-quality financial CoT data to acquire foundational reasoning skills.
Stage 2 (RL): Using GRPO, the model is further optimized with dual reward signals—format compliance and answer accuracy—guided by a model-based verifier (Qwen2.5-Max) to reduce bias in rule-based rewards.

This combination enables Fin-R1 to generate responses that are not only correct but also logically structured, compliant, and interpretable—critical for auditability in regulated environments.

Practical Use Cases Where Fin-R1 Adds Immediate Value

Fin-R1 is engineered for real-world financial workflows. Key applications include:

Financial Code Generation

Automatically generate Python or R scripts for risk modeling, portfolio optimization, or derivative pricing—complete with comments and error handling.

Quantitative Financial Calculations

Solve complex valuation problems (e.g., NPV, IRR, option Greeks) with step-by-step numerical reasoning, reducing manual errors.

Cross-Lingual Financial Reporting

Support multilingual financial analysis by generating or translating reports in English and Chinese while preserving technical accuracy.

Regulatory Compliance Screening

Evaluate whether a proposed action (e.g., client advice, product design) adheres to financial regulations, flagging potential compliance breaches.

Intelligent Risk Control

Identify anomalous transactions or portfolio behaviors by reasoning over historical patterns and risk thresholds—enhancing fraud and operational risk systems.

ESG (Environmental, Social, Governance) Analysis

Assess corporate sustainability disclosures, score ESG performance, and link findings to investment theses using structured reasoning.

These capabilities allow institutions to embed expert-level financial logic into internal tools, customer-facing chatbots, or research pipelines—without relying on opaque or oversized models.

Getting Started with Fin-R1: Deployment in Minutes

Fin-R1 is designed for ease of adoption:

Download the model from Hugging Face:

git lfs install
git clone https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1

Deploy with vLLM for high-throughput, low-latency inference:

pip install vllm
vllm serve "/path/Fin-R1" --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --max-model-len 16384 --tensor-parallel-size 2 --served-model-name "Fin-R1"

Query via OpenAI-compatible API:

from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://0.0.0.0:8000/v1")
response = client.chat.completions.create(model="Fin-R1",messages=[{"role": "system", "content": "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>...n</think>n<answer>...n</answer>"},{"role": "user", "content": "How to calculate the Sharpe ratio for a portfolio with annual return of 12%, risk-free rate of 3%, and standard deviation of 10%?"}],temperature=0.7,max_tokens=4000
)

The 7B parameter size allows deployment on a single or dual consumer-grade GPU, making it accessible for teams without large-scale AI infrastructure.

Important Limitations and Responsible Use

While Fin-R1 excels in financial reasoning, users should be aware of its boundaries:

Not a substitute for professional advice: All outputs are for reference only. Final decisions must be validated by qualified financial, legal, or compliance professionals.
Performance gap on non-reasoning tasks: On some non-reasoning financial tasks (e.g., Ant_Finance), DeepSeek-R1 (671B) still leads by ~9 points. Fin-R1 specializes in reasoning, not general financial knowledge recall.
Coverage limitations: The model may underperform on highly niche instruments (e.g., exotic derivatives) not represented in its training data. Custom fine-tuning may be required for specialized domains.
Human-in-the-loop required: Always cross-check critical outputs. Fin-R1 enhances—does not replace—human expertise.

Summary

Fin-R1 proves that domain-focused architecture, high-fidelity reasoning data, and intelligent training strategies can outperform brute-force scaling in specialized fields like finance. For technical decision-makers seeking an efficient, accurate, and deployable LLM for financial reasoning—without the cost or complexity of trillion-parameter systems—Fin-R1 offers a compelling, production-ready solution that delivers measurable value across banking, investment, compliance, and research workflows.