Skip to content

PaperCodex

Subscribe
EliGen: Achieve Precise Entity-Level Control in AI Image Generation Without Retraining Models

EliGen: Achieve Precise Entity-Level Control in AI Image Generation Without Retraining Models 11062

Text-to-image diffusion models have revolutionized creative workflows, but they still struggle with a fundamental limitation: global prompts alone often fail…

12/18/2025Controllable Text-to-image Synthesis, Entity-level Image Generation, Region-guided Diffusion Models
Mini-InternVL: Achieve 90% of Multimodal Performance with Just 5% of Model Size for Edge and Consumer Deployments

Mini-InternVL: Achieve 90% of Multimodal Performance with Just 5% of Model Size for Edge and Consumer Deployments 9328

In an era where multimodal large language models (MLLMs) are rapidly advancing, a critical barrier remains: most high-performing vision-language models…

12/18/2025Edge AI, Multimodal Reasoning, vision-language modeling
AnimateDiff: Bring Your Custom AI Image Models to Life—Without Retraining

AnimateDiff: Bring Your Custom AI Image Models to Life—Without Retraining 11796

If you’ve spent time fine-tuning a Stable Diffusion model—perhaps with DreamBooth or LoRA—to generate your ideal character, product mockup, or…

12/18/2025Motion Priors Learning, Personalized Animation, Text-to-Video Generation
Seamless: Real-Time, Expressive, and Multilingual Speech Translation for Natural Cross-Language Communication

Seamless: Real-Time, Expressive, and Multilingual Speech Translation for Natural Cross-Language Communication 11720

In today’s globalized world, real-time communication across languages remains a major bottleneck. Traditional speech translation systems often fall short—they output…

12/18/2025Multimodal Machine Translation, Speech-to-Speech Translation, Streaming Speech Translation
Tora: Precisely Control Motion in AI-Generated Videos with Trajectory Guidance

Tora: Precisely Control Motion in AI-Generated Videos with Trajectory Guidance 1223

Creating videos with predictable, controllable motion has long been a major challenge in generative AI. While recent diffusion models produce…

12/18/2025Motion Control, Trajectory-guided Synthesis, Video Generation
Gymnasium: A Standardized, Reproducible Interface for Reinforcement Learning Environments

Gymnasium: A Standardized, Reproducible Interface for Reinforcement Learning Environments 10396

Reinforcement learning (RL) holds immense promise for solving complex decision-making problems—from robotics and game playing to resource optimization and autonomous…

12/18/2025Algorithm Benchmarking, Environment Simulation, Reinforcement Learning
Search-o1: Boost Large Reasoning Models with On-Demand Knowledge Retrieval for Complex Problem Solving

Search-o1: Boost Large Reasoning Models with On-Demand Knowledge Retrieval for Complex Problem Solving 1119

Large reasoning models (LRMs)—such as OpenAI’s o1—excel at multi-step logical reasoning, especially in science, math, and code-related tasks. But they…

12/18/2025Agentic Search, Complex Reasoning, Retrieval-Augmented Generation
ReCamMaster: Reshoot Any Video with New Camera Movements—No 3D Assets or Multi-Camera Setup Needed

ReCamMaster: Reshoot Any Video with New Camera Movements—No 3D Assets or Multi-Camera Setup Needed 1655

Imagine being able to take a single, static video shot on your phone and instantly transform it into a cinematic…

12/18/2025Camera-controlled Video Generation, Generative Video Editing, Video Re-rendering
AgentCPM-GUI: On-Device AI Agent for Bilingual Mobile Automation with Reinforcement Fine-Tuning

AgentCPM-GUI: On-Device AI Agent for Bilingual Mobile Automation with Reinforcement Fine-Tuning 1142

AgentCPM-GUI is an open-source, on-device large language model (LLM) agent designed to understand smartphone screenshots and autonomously perform user-specified tasks…

12/18/2025GUI Agent, Mobile Automation, On-device Inference
UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical

UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical 1079

Large Language Models (LLMs) are transforming how we build intelligent applications—from customer service bots to clinical decision support tools. Yet…

12/18/2025Hallucination Detection, LLM Reliability, Uncertainty Quantification

Posts pagination

Previous 1 … 33 34 35 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex