YuLan is an open-source large language model (LLM) series developed by the Gaoling School of Artificial Intelligence (GSAI) at Renmin University of China. Unlike many open-weight models that offer limited insight into their training process, YuLan—particularly its latest 12B-parameter variant—was trained entirely from scratch and accompanied by a detailed technical report. This commitment to transparency directly addresses a critical pain point in the LLM community: the black-box nature of model development that hinders reproducibility, trust, and further innovation.
Designed with strong bilingual capabilities in both English and Chinese, YuLan delivers competitive performance across standard benchmarks while offering practical features like extended context length and lightweight variants. All code, models, and training methodologies are publicly available under the MIT License (for academic use only), making YuLan a valuable asset for researchers, educators, and developers working on multilingual or Chinese-centric natural language processing tasks.
Strong Bilingual Performance Across Key Benchmarks
One of YuLan’s standout strengths is its balanced proficiency in English and Chinese—a rarity among open-source LLMs, which often prioritize English at the expense of other languages. This dual-language competence isn’t just theoretical; it’s validated through rigorous evaluations on widely recognized benchmarks:
- On MMLU, a comprehensive English-language test of multitask knowledge, YuLan-Chat-3-12B achieves an average score of 55.7, outperforming several comparable LLaMA-2-based Chinese chat models.
- On C-Eval, a challenging Chinese knowledge benchmark, the same model scores 50.5 overall and 37.7 on the “Hard” subset—demonstrating robust understanding of complex domain-specific content in Chinese.
- In the AGI-Eval Gaokao (China’s national college entrance exam) challenge, YuLan-Chat-3-12B reaches 49.5 average, with particularly strong results in history (69.4) and geography (57.3), showing its ability to reason over culturally and linguistically nuanced material.
These results confirm that YuLan isn’t merely “Chinese-friendly”—it’s genuinely competitive in both linguistic spheres. For teams building applications targeting Chinese-speaking users or managing bilingual workflows, this eliminates the need to maintain separate models for each language.
Practical Features for Real-World Use
Beyond raw performance, YuLan incorporates design choices that directly address deployment challenges:
Expanded Chinese Vocabulary and 4K Context
The model vocabulary has been extended to 51,190 tokens, with dedicated inclusion of high-frequency Chinese characters and words. Combined with a 4,096-token context window, this enables more accurate tokenization and generation for longer Chinese inputs—such as technical documents, customer service transcripts, or academic essays—where shorter contexts or poor tokenization often degrade performance in standard LLMs.
Lightweight Option: YuLan-Mini
Recognizing that not all projects require a 12B-parameter model, the team released YuLan-Mini (2.4B parameters) in December 2024. Trained on 1T tokens, it offers a nimble alternative for resource-constrained environments, edge deployment, or rapid prototyping—without sacrificing core bilingual functionality.
Easy Integration and Developer-Friendly Usage
YuLan significantly lowers the barrier to entry for experimentation and integration:
- Hugging Face Compatibility: Models like
YuLan-Chat-3-12Bare available on Hugging Face Hub and loadable with just a few lines of code usingtransformers, mirroring the standard LLaMA workflow. - Command-Line Inference: A simple
inference.pyscript allows immediate testing without complex pipelines. - 8-Bit Quantization Support: With the
--load_in_8bitflag, the 13B model runs on a single RTX 3090 (24GB), and the 65B version fits on an A100 (80GB)—making powerful inference accessible even without multi-GPU setups.
This ease of use contrasts sharply with many open-source LLMs that require custom loaders, patching, or undocumented preprocessing steps.
Clear Limitations and Responsible Use
The YuLan team is transparent about the model’s constraints:
- Like all probabilistic language models, YuLan may generate biased, inaccurate, or harmful content, despite extensive alignment training via curriculum learning and human preference data.
- The MIT License restricts usage to academic purposes only. Commercial deployment is not permitted under current terms.
This upfront disclosure helps technical decision-makers assess risk and compliance—especially important in research or educational contexts where ethical AI use is paramount.
When (and When Not) to Choose YuLan
Ideal for:
- Academic research requiring full training transparency and reproducibility
- Bilingual (English–Chinese) chatbots, tutoring systems, or content generation
- Chinese NLP tasks needing long-context understanding (e.g., document summarization, QA)
- Lightweight LLM experimentation via YuLan-Mini
Not recommended for:
- Commercial products (due to academic-use-only licensing)
- Safety-critical applications (e.g., medical diagnosis, legal advice) without rigorous downstream safeguards
- Purely English-only projects where smaller or more specialized models (e.g., Mistral, Llama-3) might offer better efficiency
Summary
YuLan stands out in the crowded open-source LLM landscape by combining full-from-scratch training transparency, strong bilingual English–Chinese performance, and practical deployment features—all under a permissive (though academically restricted) license. For researchers, educators, and developers working on multilingual AI—especially those focused on Chinese language processing—YuLan offers a rare blend of capability, clarity, and accessibility that empowers informed, responsible innovation.