Reconstructing accurate 3D geometry from 2D images has long been a fragmented, multi-step process—requiring separate models for camera pose estimation,…
VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos 1364
As AI-generated videos grow increasingly convincing—featuring smooth motion, vivid aesthetics, and coherent narratives—a critical question emerges: How do we reliably…
Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning 2276
When labeled visual data is scarce—think dozens or hundreds of examples per category—traditional supervised fine-tuning (SFT) often falls short. Enter…
EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality 2049
For teams deploying large language models (LLMs) in production—whether for chatbots, reasoning APIs, or batch processing—latency and inference cost are…
AI-Researcher: Automate End-to-End AI Research from Idea to Publication 3753
Scientific research in artificial intelligence is increasingly complex, time-consuming, and resource-intensive. From synthesizing hundreds of papers to prototyping novel algorithms…
Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO 1337
Subject-driven image generation—where users provide one or more reference images of specific objects to guide the creation of new scenes—is…
Flow-GRPO: Boost Text-to-Image Accuracy with Online RL—Without Sacrificing Quality or Diversity 1720
If you’ve ever struggled with diffusion models failing to follow detailed prompts—like “a golden retriever sitting to the left of…
Memento: Build Smarter LLM Agents That Learn from Experience—Without Fine-Tuning 2060
In today’s fast-paced AI landscape, teams building intelligent agents face a persistent dilemma: how to make large language models (LLMs)…
Matrix-Game: Controllable, Real-Time Game World Generation with Pixel-Perfect Action Responsiveness 1768
Matrix-Game is an open-source interactive world foundation model developed by Skywork AI, specifically designed for real-time, controllable generation of game…
FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens 1082
FlowTok reimagines cross-modal generation by collapsing the traditionally complex boundary between text and images into a streamlined, efficient process. Unlike…