PaperCodex

MiniRAG: Enable Small Language Models to Deliver Powerful RAG with Minimal Resources 1605

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for grounding language models in factual knowledge. However, traditional RAG pipelines struggle…

12/15/2025Knowledge Graph Reasoning, On-Device AI, Retrieval-Augmented Generation

EASYTOOL: Streamline LLM Agent Tool Usage with Concise, Unified Instructions 24492

Building capable AI agents that interact with real-world tools—like APIs, software libraries, or external services—is a core challenge in deploying…

12/15/2025Multi-tool Agent Coordination, Task Automation, Tool-augmented Reasoning

AutoGL: Automate Graph Learning Pipelines Without Manual Tuning or Expert GNN Knowledge 1131

Graph-based machine learning has become essential across domains—from social network analysis and fraud detection to drug discovery and recommendation systems.…

12/15/2025Graph Classification, Link Prediction, Node Classification

AutoAgents: Automatically Generate Specialized AI Teams for Complex Real-World Tasks 1444

Building intelligent systems that can handle open-ended, multi-step problems has long been a challenge in AI development. Traditional multi-agent frameworks…

12/15/202512/15/2025AI Task Automation, Dynamic Role Generation, Multi-agent Systems

Amphion: A Unified Open-Source Toolkit for Zero-Shot Speech, Singing, and Audio Generation 9539

Amphion is an open-source toolkit purpose-built for audio, music, and speech generation that dramatically lowers the entry barrier for junior…

12/15/202512/15/2025Singing Voice Conversion, Text-to-Speech, Voice Conversion

$PP-FormulaNet: High-Accuracy and High-Speed Math Formula Recognition for Document Intelligence$

PP-FormulaNet: High-Accuracy and High-Speed Math Formula Recognition for Document Intelligence 5930

In the world of scientific publishing, academic research, and educational technology, one persistent bottleneck remains: converting handwritten or printed mathematical…

12/13/2025Document Intelligence, Formula Recognition, Optical Character Recognition (OCR)

SGLang: High-Performance LLM Serving for Structured, Multi-Step, and Multimodal AI Applications 21238

Large language models (LLMs) are no longer just tools for answering questions—they power agents, structured data pipelines, multi-turn conversations, and…

12/13/202512/13/2025Multi-step Agent Execution, Multimodal LLM Serving, Structured Output Generation

PP-OCR: Ultra-Lightweight, Multilingual OCR and Document AI for Real-World Applications 66154

In today’s AI-driven world, turning unstructured visual data—like scanned invoices, handwritten notes, or multilingual PDFs—into structured, machine-readable formats is a…

12/12/2025Document Parsing, Optical Character Recognition, vision-language modeling

MinerU: High-Precision Open-Source Document Parsing for Real-World PDFs, Tables, and Formulas 50296

Converting real-world documents—especially PDFs containing mixed content like equations, tables, multi-column layouts, and scanned text—into clean, structured, machine-readable formats remains…

12/12/202512/12/2025Document Parsing, Multimodal Understanding, Optical Character Recognition (OCR)