Skip to content

PaperCodex

Subscribe
BrowseComp: A Focused Benchmark for Evaluating Web-Browsing Capabilities in AI Agents

BrowseComp: A Focused Benchmark for Evaluating Web-Browsing Capabilities in AI Agents 4214

Evaluating whether an AI agent can truly browse the web—navigating across pages, persisting through dead ends, and extracting entangled facts—is…

12/19/2025Information Retrieval, Tool-augmented Reasoning, Web Browsing Agents
Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training

Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training 1809

Perception Encoder (PE) redefines what’s possible with a single vision encoder. Unlike legacy approaches that demand different pretraining strategies for…

12/19/2025Dense Visual Prediction, Multimodal Visual Question Answering, Zero-shot Image Classification
Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01

Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01 2943

Creating professional academic posters from dense, multi-page scientific papers is a universal pain point for researchers, PhD students, and lab…

12/19/2025Automated Academic Communication, Multimodal Document Understanding, Scientific Poster Generation
RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization

RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization 1726

Robotic manipulation—especially with two arms working in coordination—is essential for complex real-world tasks like assembling electronics, handling kitchenware, or performing…

12/19/2025Bimanual Robotic Manipulation, Sim-to-real Transfer, Vision-language-action Learning
MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation

MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation 3016

Evaluating image embedding models has long been a fragmented and inconsistent process. Researchers and engineers often test models on narrow,…

12/19/2025Cross-Modal Retrieval, Image Embedding Evaluation, Visual Representation Learning
AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment

AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment 4460

Deploying large language models (LLMs) at scale in production environments remains a significant challenge for engineering teams. High inference costs,…

12/19/2025GenAI Deployment, LLM Inference, Model Serving
Pre³: Accelerate Structured LLM Output Generation with Deterministic Grammar Control

Pre³: Accelerate Structured LLM Output Generation with Deterministic Grammar Control 3784

Modern large language model (LLM) applications increasingly rely on structured outputs—think JSON responses for APIs, XML configuration files, or tool-call…

12/19/202512/19/2025Constrained Decoding, Grammar-Guided LLM Inference, Structured Output Generation
Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies

Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies 1076

In the ever-evolving landscape of generative AI, image synthesis has long been dominated by diffusion models—powerful, yet often complex, resource-intensive,…

12/19/2025Controllable Image Synthesis, Image Editing, Text-to-Image Generation
USO: Unified Image Generation that Preserves Subjects and Applies Styles in One Framework

USO: Unified Image Generation that Preserves Subjects and Applies Styles in One Framework 1194

Generative AI has made remarkable strides in image synthesis, yet many tools force users to choose between style-driven and subject-driven…

12/19/2025Style Transfer, Subject-driven Generation, Unified Image Generation
GLM-4.5: Open-Source MoE LLM for High-Performance Agentic Reasoning and Coding

GLM-4.5: Open-Source MoE LLM for High-Performance Agentic Reasoning and Coding 3288

GLM-4.5 is an open-source, high-performance Mixture-of-Experts (MoE) large language model engineered specifically for intelligent agents that need to reason, code,…

12/19/2025Agentic Reasoning, Code Generation, Mixture-of-Experts

Posts pagination

Previous 1 … 29 30 31 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex