Evaluating whether an AI agent can truly browse the web—navigating across pages, persisting through dead ends, and extracting entangled facts—is…
Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training 1809
Perception Encoder (PE) redefines what’s possible with a single vision encoder. Unlike legacy approaches that demand different pretraining strategies for…
Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01 2943
Creating professional academic posters from dense, multi-page scientific papers is a universal pain point for researchers, PhD students, and lab…
RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization 1726
Robotic manipulation—especially with two arms working in coordination—is essential for complex real-world tasks like assembling electronics, handling kitchenware, or performing…
MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation 3016
Evaluating image embedding models has long been a fragmented and inconsistent process. Researchers and engineers often test models on narrow,…
AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment 4460
Deploying large language models (LLMs) at scale in production environments remains a significant challenge for engineering teams. High inference costs,…
Pre³: Accelerate Structured LLM Output Generation with Deterministic Grammar Control 3784
Modern large language model (LLM) applications increasingly rely on structured outputs—think JSON responses for APIs, XML configuration files, or tool-call…
Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies 1076
In the ever-evolving landscape of generative AI, image synthesis has long been dominated by diffusion models—powerful, yet often complex, resource-intensive,…
USO: Unified Image Generation that Preserves Subjects and Applies Styles in One Framework 1194
Generative AI has made remarkable strides in image synthesis, yet many tools force users to choose between style-driven and subject-driven…
GLM-4.5: Open-Source MoE LLM for High-Performance Agentic Reasoning and Coding 3288
GLM-4.5 is an open-source, high-performance Mixture-of-Experts (MoE) large language model engineered specifically for intelligent agents that need to reason, code,…