PaperCodex

RCG: High-Fidelity Unconditional Image Generation Without Labels 929

For years, unconditional image generation—creating realistic images without relying on human-provided class labels—has lagged significantly behind its class-conditional counterpart in…

01/13/2026Diffusion Models, Self-Supervised Representation Learning, Unconditional Image Generation

JoyVASA: Animate Human and Animal Portraits from Audio with Diffusion-Based Lip Sync and Head Motion 840

Audio-driven facial animation has long been a challenging yet highly valuable capability—from building expressive virtual agents to creating personalized pet…

01/13/2026Audio-driven Facial Animation, Cross-species Avatar Synthesis, Diffusion-based Video Generation

Stereo Anything: Zero-Shot Stereo Matching That Works Across Any Domain Without Retraining 803

Stereo matching—the task of finding corresponding pixels between left and right images to infer depth—is foundational to 3D vision systems…

01/13/20263D Vision, Stereo Matching, Zero-shot Generalization

SQuARE: Boost LLM Reasoning with Self-Generated Questions—No Retraining Needed 757

Solving complex reasoning problems is a persistent challenge for Large Language Models (LLMs). While techniques like Chain-of-Thought (CoT) prompting have…

01/13/2026Chain-of-thought Prompting, Question Answering, Reasoning Enhancement

FoleyCrafter: Generate Lifelike, Synchronized Sound Effects for Silent Videos—Automatically 630

Silent videos—whether from AI-generated content, archival footage, gameplay recordings, or unfinished film prototypes—often lack the immersive quality that sound brings.…

01/13/2026Multimodal Synthesis, Neural Foley, Video-to-audio Generation

SAMMO: Optimize LLM Prompt Programs Like Code—Structure-Aware, Compile-Time Tuning for RAG, Instruction Refinement, and Prompt Compression 731

Modern LLM applications increasingly rely on complex, structured prompts—especially in scenarios like Retrieval-Augmented Generation (RAG), instruction-based tasks, and data labeling…

01/13/2026Instruction Tuning, Prompt Optimization, Retrieval-Augmented Generation (RAG)

PaperCodex

RCG: High-Fidelity Unconditional Image Generation Without Labels 929

JoyVASA: Animate Human and Animal Portraits from Audio with Diffusion-Based Lip Sync and Head Motion 840

Stereo Anything: Zero-Shot Stereo Matching That Works Across Any Domain Without Retraining 803

SQuARE: Boost LLM Reasoning with Self-Generated Questions—No Retraining Needed 757

FoleyCrafter: Generate Lifelike, Synchronized Sound Effects for Silent Videos—Automatically 630

SAMMO: Optimize LLM Prompt Programs Like Code—Structure-Aware, Compile-Time Tuning for RAG, Instruction Refinement, and Prompt Compression 731

ChangeMamba: High-Accuracy, Low-Cost Change Detection for Remote Sensing Without CNN or Transformer Trade-Offs 515

DaGAN++: Generate Realistic Talking Head Videos with Depth-Aware AI—No 3D Labels Needed 995

OCRBench: The Definitive Benchmark for Evaluating Real-World OCR Capabilities in Large Multimodal Models 726

ULIP-2: Scalable Multimodal 3D Understanding Without Manual Annotations 547