PaperCodex

Decoupled DMD: Unlock Ultra-Fast, High-Quality Image Generation with 8-Step Distillation 8234

If you’re building or evaluating text-to-image systems that demand both speed and visual fidelity, Decoupled DMD offers a breakthrough in…

01/04/2026Diffusion Model Distillation, Few-step Image Synthesis, Text-to-Image Generation

Causal-Learn: Discover True Cause-and-Effect Relationships from Observational Data in Python 1521

In many real-world scenarios—whether you’re analyzing patient outcomes in healthcare, consumer behavior in economics, or system failures in engineering—you can’t…

01/04/2026Causal Discovery, Causal Inference, Structure Learning

Kimi-Dev: Solve Real Software Bugs with a Test-Passing, Open-Source Coding LLM 1075

Kimi-Dev is a state-of-the-art open-source large language model (LLM) purpose-built for software engineering tasks. Unlike generic coding assistants that generate…

01/04/2026Automated Bug Fixing, Software Engineering Agents, Test-aware Code Generation

Vizier: Production-Grade Black-Box Optimization for Reliable Hyperparameter Tuning and System Configuration 1616

Optimizing complex systems—whether machine learning models, database configurations, or compiler flags—often feels like navigating a dark room: you know the…

01/04/2026Automated Machine Learning, Black-box Optimization, Hyperparameter Optimization

AgentBench: Objectively Evaluate LLMs as Real-World Agents Across 8 Practical Environments 3017

As large language models (LLMs) increasingly power autonomous agents—from customer service bots to system administration tools—a critical question arises: Can…

01/04/2026Agent Evaluation, Interactive Reasoning, Tool-Use Benchmarking

FlipVQA-Miner: Automatically Extract High-Quality Visual QA Pairs from Textbooks for Reliable LLM Training 1737

Large Language Models (LLMs) and multimodal systems increasingly demand high-quality, human-authored supervision data—especially for tasks requiring reasoning, visual understanding, and…

01/04/2026Educational Data Mining, Instruction Tuning, Visual Question Answering

Omnilingual ASR: Open-Source Speech Recognition for 1,600+ Languages—Including 500 Never Before Supported 2504

For decades, automatic speech recognition (ASR) has flourished in high-resource languages like English, Spanish, or Mandarin. But for the vast…

01/04/2026Automatic Speech Recognition, Multilingual Speech Processing, Zero-Shot Language Generalization

PokeeResearch: Open-Source, High-Accuracy Deep Research Agent with Self-Verification and RL-Optimized Reasoning 1595

In today’s fast-moving technical and research environments, teams need reliable, up-to-date answers to complex questions—without the black-box limitations or high…

01/04/2026Deep Research Agent, Reinforcement Learning From AI Feedback, Retrieval-augmented Reasoning

MimicKit: Train Physics-Based Character Controllers with Motion Imitation and Reinforcement Learning 1196

Imagine needing realistic, physics-compliant character movement for a game, simulation, or robotics project—but without the months of trial, error, and…

01/04/2026Character Animation, Motion Imitation, Reinforcement Learning

DocLayout-YOLO: Real-Time, High-Accuracy Document Layout Detection Without the Speed-Accuracy Trade-Off 1870

Document layout analysis (DLA) is a foundational task in building real-world document understanding systems—whether you’re extracting structured data from invoices,…

01/04/2026Document Layout Analysis, Document Understanding, Object Detection