PaperCodex

DreamTalk: Generate Emotionally Expressive Talking Heads from Audio Using Diffusion Models 1767

Creating lifelike digital avatars that speak naturally with accurate lip movements and rich emotional expression has long been a challenge…

12/18/2025Audio-to-Video Generation, Emotion-Aware Synthesis, Talking Head Generation

SkyThought: Boost Code Generation Accuracy Without Retraining—Even Small Models Beat GPT-4o-mini 3358

SkyThought is an open-source framework built around S*—a breakthrough test-time scaling approach designed specifically to elevate code generation performance in…

12/18/2025Code Generation, Program Synthesis, Test-time Scaling

AniTalker: Generate Lifelike, Expressive Talking Faces from a Single Image and Audio Clip 1588

Imagine turning a static portrait—like the Mona Lisa or a headshot from your LinkedIn profile—into a vivid, talking avatar that…

12/18/2025Facial Animation, Identity-Decoupled Motion Modeling, Talking Face Generation

Colossal-Auto: Automate Large Model Training with Zero Expertise in Parallelization or Checkpointing 41290

Training large-scale AI models—whether language models like LLaMA or video generators like Open-Sora—has become increasingly common, yet remains bottlenecked by…

12/18/2025Distributed Deep Learning, Large Language Model Training, Video Generation Model Training

EasyVolcap: Streamline Neural Volumetric Video Research with a Unified, Real-Time PyTorch Framework 1508

Neural volumetric video—capturing and rendering dynamic 3D scenes that can be viewed from any angle and time—is no longer just…

12/18/20254D Reconstruction, Dynamic Scene Modeling, Neural Rendering

AnyText: Generate and Edit Multilingual Text in AI Images with Pixel-Perfect Accuracy 4822

If you’ve ever tried using a standard AI image generator to create a poster, product mockup, or social media banner…

12/18/2025Multilingual Image Synthesis, Text-to-Image Generation, Visual Text Editing

DreamCraft3D: Generate Photorealistic, View-Consistent 3D Assets from a Single Image 2989

Creating high-quality 3D assets has traditionally required expert modeling skills, extensive manual labor, or expensive capture setups—barriers that limit accessibility…

12/18/20253D Generation, Diffusion Models, View-Consistent Rendering

S1: Boost Reasoning Performance with Just 1,000 Examples and Smart Test-Time Scaling 6613

In the rapidly evolving landscape of large language models (LLMs), achieving strong reasoning capabilities often comes at the cost of…

12/18/2025Mathematical Reasoning, Structured Reasoning, Test-time Scaling

SwinIR: State-of-the-Art Image Restoration with Fewer Parameters and Higher Quality 5230

Image quality degradation—whether from compression, noise, or low resolution—is a persistent challenge across industries ranging from medical imaging to consumer…

12/18/2025Compression Artifact Reduction, Image Denoising, Image Super-resolution

PP-PicoDet: Real-Time Object Detection with SOTA Accuracy on Mobile and Edge Devices 13974

In today’s era of intelligent edge computing, deploying high-performance computer vision models on resource-constrained devices like smartphones, embedded sensors, and…

12/18/2025Edge AI, Object Detection, Real-Time Inference