PaperCodex

DRIT: Generate Diverse, Realistic Image Translations from Unpaired Data—No Paired Examples Needed 860

Image-to-image translation is a powerful capability in computer vision—but real-world applications often face two stubborn roadblocks: the absence of aligned…

01/09/2026Disentangled Representation, Image-to-image Translation, Unpaired Image Translation

string2string: Unified Python Library for String Alignment, Search, Similarity & Evaluation Across NLP and Bioinformatics 555

If you’re building applications that involve comparing, aligning, searching, or evaluating text—whether in natural language processing (NLP), bioinformatics, or computational…

01/09/2026Semantic Search, Sequence Alignment, Text Similarity

$Light-R1: Train High-Performance Math Reasoning Models from Public Data in Under 6 Hours$

Light-R1: Train High-Performance Math Reasoning Models from Public Data in Under 6 Hours 745

If you’re building AI systems that require reliable, step-by-step mathematical reasoning—but don’t have access to proprietary datasets, massive compute budgets,…

01/09/2026Chain-of-thought Reasoning, Mathematical Reasoning, Model Distillation

R1-Onevision: Solve Complex Visual Reasoning Problems with Step-by-Step Multimodal AI 569

In today’s AI landscape, most multimodal models can describe what’s in an image—but few can reason through it. If your…

01/09/2026Multimodal Reasoning, Scientific Diagram Understanding, Visual Question Answering

PaperBench: Benchmark AI Agents’ Ability to Replicate Cutting-Edge Research from Paper to Code 913

In an era where AI systems are increasingly tasked with more than just answering questions—writing code, debugging, and even conducting…

01/09/2026AI Engineering Evaluation, End-to-end AI Benchmarking, Research Replication

Neutone SDK: Run PyTorch Audio Models in Your DAW—Zero C++, Real-Time, Python-First 540

Bringing neural audio models from research notebooks into real-world creative environments has long been a bottleneck for AI audio developers.…

01/09/2026Audio Effect Processing, Neural Audio Synthesis, Timbre Transfer

VITA-Audio: Real-Time Speech Generation with Ultra-Low Latency for End-to-End Voice AI 636

Voice interaction is becoming a cornerstone of modern human-computer interfaces—whether through smart assistants, customer service bots, or real-time translation tools.…

01/09/2026Real-time TTS, Speech Language Modeling, Spoken Question Answering

Depth-supervised NeRF: Achieve High-Quality 3D Reconstruction from Fewer Views and Faster Training—Using Only “Free” Depth from Standard Photogrammetry Pipelines 774

Neural Radiance Fields (NeRF) have revolutionized photorealistic 3D scene reconstruction—but they come with well-known limitations. One major pain point: when…

01/09/20263D Reconstruction, Geometric Deep Learning, Neural Rendering

VLog: Generate Concise, Structured Video Narrations Using Event-Based Vocabulary Instead of Generic Tokens 578

Understanding what happens in videos—especially those capturing everyday human activities—is a core challenge in AI. Most existing video-language models generate…

01/09/2026Event-based Video Understanding, Video Narration, Video-language Modeling

TinyLVLM-eHub: Fast, Lightweight Evaluation for Large Vision-Language Models Without Heavy Compute 539

As Large Vision-Language Models (LVLMs) grow increasingly capable—and increasingly complex—evaluating their multimodal reasoning, perception, and reliability has become a significant…

01/09/2026Model Evaluation, Multimodal Reasoning, Visual Question Answering