Image-to-image translation is a powerful capability in computer vision—but real-world applications often face two stubborn roadblocks: the absence of aligned…
string2string: Unified Python Library for String Alignment, Search, Similarity & Evaluation Across NLP and Bioinformatics 555
If you’re building applications that involve comparing, aligning, searching, or evaluating text—whether in natural language processing (NLP), bioinformatics, or computational…
Light-R1: Train High-Performance Math Reasoning Models from Public Data in Under 6 Hours 745
If you’re building AI systems that require reliable, step-by-step mathematical reasoning—but don’t have access to proprietary datasets, massive compute budgets,…
R1-Onevision: Solve Complex Visual Reasoning Problems with Step-by-Step Multimodal AI 569
In today’s AI landscape, most multimodal models can describe what’s in an image—but few can reason through it. If your…
PaperBench: Benchmark AI Agents’ Ability to Replicate Cutting-Edge Research from Paper to Code 913
In an era where AI systems are increasingly tasked with more than just answering questions—writing code, debugging, and even conducting…
Neutone SDK: Run PyTorch Audio Models in Your DAW—Zero C++, Real-Time, Python-First 540
Bringing neural audio models from research notebooks into real-world creative environments has long been a bottleneck for AI audio developers.…
VITA-Audio: Real-Time Speech Generation with Ultra-Low Latency for End-to-End Voice AI 636
Voice interaction is becoming a cornerstone of modern human-computer interfaces—whether through smart assistants, customer service bots, or real-time translation tools.…
Depth-supervised NeRF: Achieve High-Quality 3D Reconstruction from Fewer Views and Faster Training—Using Only “Free” Depth from Standard Photogrammetry Pipelines 774
Neural Radiance Fields (NeRF) have revolutionized photorealistic 3D scene reconstruction—but they come with well-known limitations. One major pain point: when…
VLog: Generate Concise, Structured Video Narrations Using Event-Based Vocabulary Instead of Generic Tokens 578
Understanding what happens in videos—especially those capturing everyday human activities—is a core challenge in AI. Most existing video-language models generate…
TinyLVLM-eHub: Fast, Lightweight Evaluation for Large Vision-Language Models Without Heavy Compute 539
As Large Vision-Language Models (LVLMs) grow increasingly capable—and increasingly complex—evaluating their multimodal reasoning, perception, and reliability has become a significant…