Skip to content

PaperCodex

Subscribe
AutoAgents: Automatically Generate Specialized AI Teams for Complex Real-World Tasks

AutoAgents: Automatically Generate Specialized AI Teams for Complex Real-World Tasks 1444

Building intelligent systems that can handle open-ended, multi-step problems has long been a challenge in AI development. Traditional multi-agent frameworks…

12/15/202512/15/2025AI Task Automation, Dynamic Role Generation, Multi-agent Systems
Amphion: A Unified Open-Source Toolkit for Zero-Shot Speech, Singing, and Audio Generation

Amphion: A Unified Open-Source Toolkit for Zero-Shot Speech, Singing, and Audio Generation 9539

Amphion is an open-source toolkit purpose-built for audio, music, and speech generation that dramatically lowers the entry barrier for junior…

12/15/202512/15/2025Singing Voice Conversion, Text-to-Speech, Voice Conversion
PP-FormulaNet: High-Accuracy and High-Speed Math Formula Recognition for Document Intelligence

PP-FormulaNet: High-Accuracy and High-Speed Math Formula Recognition for Document Intelligence 5930

In the world of scientific publishing, academic research, and educational technology, one persistent bottleneck remains: converting handwritten or printed mathematical…

12/13/2025Document Intelligence, Formula Recognition, Optical Character Recognition (OCR)
SGLang: High-Performance LLM Serving for Structured, Multi-Step, and Multimodal AI Applications

SGLang: High-Performance LLM Serving for Structured, Multi-Step, and Multimodal AI Applications 21238

Large language models (LLMs) are no longer just tools for answering questions—they power agents, structured data pipelines, multi-turn conversations, and…

12/13/202512/13/2025Multi-step Agent Execution, Multimodal LLM Serving, Structured Output Generation
PP-OCR: Ultra-Lightweight, Multilingual OCR and Document AI for Real-World Applications

PP-OCR: Ultra-Lightweight, Multilingual OCR and Document AI for Real-World Applications 66154

In today’s AI-driven world, turning unstructured visual data—like scanned invoices, handwritten notes, or multilingual PDFs—into structured, machine-readable formats is a…

12/12/2025Document Parsing, Optical Character Recognition, vision-language modeling
MinerU: High-Precision Open-Source Document Parsing for Real-World PDFs, Tables, and Formulas

MinerU: High-Precision Open-Source Document Parsing for Real-World PDFs, Tables, and Formulas 50296

Converting real-world documents—especially PDFs containing mixed content like equations, tables, multi-column layouts, and scanned text—into clean, structured, machine-readable formats remains…

12/12/202512/12/2025Document Parsing, Multimodal Understanding, Optical Character Recognition (OCR)
MetaGPT: Automate Full Software Development with AI Agents That Work Like a Real Engineering Team

MetaGPT: Automate Full Software Development with AI Agents That Work Like a Real Engineering Team 60511

Building reliable software from natural language prompts remains a major challenge—even for today’s most capable large language models (LLMs). While…

12/12/2025Automated Software Engineering, LLM-based Workflow Automation, Multi-agent Systems
LlamaFactory: Fine-Tune 100+ Language Models Effortlessly—No Coding Required

LlamaFactory: Fine-Tune 100+ Language Models Effortlessly—No Coding Required 63856

Fine-tuning large language models (LLMs) used to be a complex, time-consuming endeavor—requiring deep expertise in deep learning frameworks, custom code…

12/12/2025Multimodal Learning, Preference Alignment, Supervised Fine-tuning
llama.cpp: Run Large Language Models Anywhere—Fast, Lightweight, and Offline

llama.cpp: Run Large Language Models Anywhere—Fast, Lightweight, and Offline 91182

In an era where large language models (LLMs) power everything from chatbots to code assistants, deploying them outside of cloud…

12/12/202512/15/2025Multimodal Inference, Offline LLM Deployment, Text Generation
BitNet: Run 1.58-Bit LLMs Locally on CPUs with 6x Speedup and 82% Less Energy

BitNet: Run 1.58-Bit LLMs Locally on CPUs with 6x Speedup and 82% Less Energy 24452

Running large language models (LLMs) used to require powerful GPUs, expensive cloud infrastructure, or specialized hardware—until BitNet changed the game.…

12/12/2025Efficient LLM Deployment, On-device Inference, Text Generation

Posts pagination

Previous 1 … 39 40 41 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex