PaperCodex

Lumina-Image 2.0: High-Quality, Efficient Text-to-Image Generation with Unified Architecture and Strong Open-Source Support 805

Lumina-Image 2.0 is a state-of-the-art open-source text-to-image (T2I) generation framework that delivers exceptional visual fidelity and prompt adherence while maintaining…

01/09/2026Controllable Image Synthesis, Multimodal Generative Modeling, Text-to-Image Generation

Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks 709

Video understanding has long been a bottleneck for multimodal large language models (MLLMs). While models can recognize objects or scenes…

01/09/2026Multimodal Reinforcement Learning, Temporal Modeling, Video Reasoning

PharMolixFM: High-Accuracy, All-Atom Molecular Modeling for Real-World Drug Discovery and Structural Biology 925

PharMolixFM is an all-atom foundation model purpose-built for molecular modeling and generation, jointly developed by PharMolix Inc. and the Institute…

01/09/2026Molecular Conformation Generation, Molecular Docking, Structure-based Drug Design

ActionStudio: Unify, Train, and Deploy Large Action Models 9x Faster for Autonomous Agents 563

As autonomous AI agents become central to real-world applications—from customer service bots to robotic process automation—the demand for Large Action…

01/09/2026Autonomous Agents, Function Calling, Multi-turn Reasoning

Text-to-LoRA: Instantly Customize LLMs with Plain English—No Training or Datasets Required 889

Large language models (LLMs) are powerful, but adapting them to specific tasks often demands significant effort: collecting labeled data, tuning…

01/09/2026Low-rank Adaptation, Parameter-Efficient Fine-Tuning, Zero-shot Task Adaptation

MemoryOS: Give Your AI Agent Long-Term Memory and Personalized Context with an OS-Inspired Architecture 767

Most AI agents powered by Large Language Models (LLMs) struggle with a fundamental limitation: their fixed context windows. Once a…

01/09/2026Contextual Conversation Management, Long-term Memory For LLMs, Personalized AI Agents

DeepEyes: Enable Vision-Language Models to “Think with Images” and Solve Complex Visual Reasoning Tasks 858

Most modern Vision-Language Models (VLMs) treat images as static inputs—processed once, then reasoned about using purely text-based logic. But humans…

01/09/2026Multimodal Reinforcement Learning, vision-language modeling, Visual Reasoning

OpenGait: High-Accuracy, Open-Source Gait Recognition That Filters Out Clothing, Backgrounds, and Noise 918

If you’re evaluating biometric identification systems that work at a distance—without requiring cooperation, contact, or even clear facial visibility—gait recognition…

01/09/2026Biometric Identification, Gait Recognition, Video Denoising

LLMC+: Plug-and-Play Compression for Vision-Language and Large Language Models Without Retraining 577

Deploying large vision-language models (VLMs) and large language models (LLMs) in real-world applications is often bottlenecked by their massive size,…

01/09/2026Efficient Inference, Model Compression, vision-language modeling

app.build: Generate Production-Ready, Validated Full-Stack Apps from a Single Prompt 606

Imagine turning a simple idea—like “a task manager with user authentication, real-time updates, and a clean UI”—into a fully working,…

01/09/2026Agentic Software Development, AI-powered Application Generation, Automated Full-stack Code Generation