Modern image generation is powerful—but fragmented. Depending on your goal—generating from text, editing existing images, preserving a person’s identity, or…
AniPortrait: Generate Photorealistic Talking-Head Videos from a Single Image and Audio Clip 5006
Creating lifelike, animated human faces used to require complex pipelines—motion capture rigs, professional voice actors, or hours of post-production. But…
GaussianObject: High-Quality 3D Reconstruction from Just Four Images—No COLMAP Required 1120
Creating photorealistic 3D models of real-world objects typically demands dozens—or even hundreds—of input images captured from carefully calibrated viewpoints. This…
AM-RADIO: Unify Vision Foundation Models into One High-Performance Backbone for Multimodal, Segmentation, and Detection Tasks 1357
In modern computer vision, practitioners often juggle multiple foundation models—CLIP for vision-language alignment, DINOv2 for dense feature extraction, and SAM…
Semantic Operators: Declarative, Fast, and Accurate AI-Powered Data Processing for Unstructured and Structured Data 1484
Processing unstructured data—like free-form text, documents, or multimodal inputs—with large language models (LLMs) has become essential across industries, from biomedical…
NeedleBench: Rigorously Evaluate LLM Retrieval and Reasoning in Long-Context Scenarios 6409
Evaluating how well large language models (LLMs) retrieve critical facts and perform reasoning over long documents remains a major challenge…
AgentVerse: Build Collaborative LLM Agent Teams for Real Tasks or Behavioral Simulation 4884
In today’s AI landscape, single-agent systems—powered by large language models (LLMs)—often hit a ceiling when tackling complex, multi-step problems. What…
Align Anything: The First Open Framework for Aligning Any-to-Any Multimodal Models with Human Intent 4562
As AI systems grow more capable across diverse data types—text, images, audio, and video—the challenge of aligning them with human…
SPHINX-X: Build Scalable Multimodal AI Faster with Unified Training, Diverse Data, and Flexible Model Sizes 2794
SPHINX-X is a next-generation family of Multimodal Large Language Models (MLLMs) designed to streamline the development, training, and deployment of…
Xorbits: Scale Pandas and NumPy Workflows to Clusters—With Just One Line of Code 1199
Data scientists and machine learning engineers routinely rely on pandas and NumPy for data wrangling, exploration, and modeling. These libraries…