Skip to content

PaperCodex

Subscribe
VGGT: One Model to Reconstruct 3D Scenes Instantly—No Post-Processing Required

VGGT: One Model to Reconstruct 3D Scenes Instantly—No Post-Processing Required 11917

Reconstructing accurate 3D geometry from 2D images has long been a fragmented, multi-step process—requiring separate models for camera pose estimation,…

12/19/20253D Reconstruction, Camera Pose Estimation, Multi-view Geometry
VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos

VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos 1364

As AI-generated videos grow increasingly convincing—featuring smooth motion, vivid aesthetics, and coherent narratives—a critical question emerges: How do we reliably…

12/19/2025Intrinsic Faithfulness Benchmarking, Multimodal Model Assessment, Video Generation Evaluation
Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning

Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning 2276

When labeled visual data is scarce—think dozens or hundreds of examples per category—traditional supervised fine-tuning (SFT) often falls short. Enter…

12/19/2025Few-shot Object Detection, Fine-grained Image Classification, Visual Reasoning Grounding
EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality

EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality 2049

For teams deploying large language models (LLMs) in production—whether for chatbots, reasoning APIs, or batch processing—latency and inference cost are…

12/19/2025Efficient Language Model Serving, LLM Inference Acceleration, Speculative Decoding
AI-Researcher: Automate End-to-End AI Research from Idea to Publication

AI-Researcher: Automate End-to-End AI Research from Idea to Publication 3753

Scientific research in artificial intelligence is increasingly complex, time-consuming, and resource-intensive. From synthesizing hundreds of papers to prototyping novel algorithms…

12/19/2025Algorithm Prototyping, Autonomous AI Research, Scientific Manuscript Generation
Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO

Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO 1337

Subject-driven image generation—where users provide one or more reference images of specific objects to guide the creation of new scenes—is…

12/19/2025Controllable Diffusion Models, Multi-subject Image Synthesis, Subject-driven Image Generation
Flow-GRPO: Boost Text-to-Image Accuracy with Online RL—Without Sacrificing Quality or Diversity

Flow-GRPO: Boost Text-to-Image Accuracy with Online RL—Without Sacrificing Quality or Diversity 1720

If you’ve ever struggled with diffusion models failing to follow detailed prompts—like “a golden retriever sitting to the left of…

12/19/2025Controllable Diffusion Models, Reinforcement Learning For Generative Models, Text-to-Image Generation
Memento: Build Smarter LLM Agents That Learn from Experience—Without Fine-Tuning

Memento: Build Smarter LLM Agents That Learn from Experience—Without Fine-Tuning 2060

In today’s fast-paced AI landscape, teams building intelligent agents face a persistent dilemma: how to make large language models (LLMs)…

12/19/2025Agent-based Reasoning, Continual Learning, Memory-Augmented LLMs
Matrix-Game: Controllable, Real-Time Game World Generation with Pixel-Perfect Action Responsiveness

Matrix-Game: Controllable, Real-Time Game World Generation with Pixel-Perfect Action Responsiveness 1768

Matrix-Game is an open-source interactive world foundation model developed by Skywork AI, specifically designed for real-time, controllable generation of game…

12/19/2025Action-conditioned Simulation, Controllable Video Generation, Interactive World Modeling
FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens

FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens 1082

FlowTok reimagines cross-modal generation by collapsing the traditionally complex boundary between text and images into a streamlined, efficient process. Unlike…

12/19/2025Image-to-text Generation, Multimodal Representation Learning, Text-to-Image Generation

Posts pagination

Previous 1 … 27 28 29 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex