Skip to content

PaperCodex

Subscribe
InstantStyle: Effortless, Tuning-Free Style Preservation for Text-to-Image Generation

InstantStyle: Effortless, Tuning-Free Style Preservation for Text-to-Image Generation 1969

InstantStyle is a breakthrough framework that enables high-fidelity, style-consistent image generation without requiring any model retraining or per-image tuning. Built…

12/19/2025Image Stylization, Style Transfer, Text-to-Image Generation
InternGPT: Solve Vision-Centric Tasks with Clicks, Scribbles, and ChatGPT-Level Reasoning

InternGPT: Solve Vision-Centric Tasks with Clicks, Scribbles, and ChatGPT-Level Reasoning 3221

In today’s AI landscape, large language models (LLMs) like ChatGPT have transformed how we interact with software—through natural language. But…

12/19/2025Interactive Image Editing, Multimodal Reasoning, vision-language modeling
Marco-o1: Open-Source Reasoning Models That Reduce Hallucination and Over-Thinking in Complex Tasks

Marco-o1: Open-Source Reasoning Models That Reduce Hallucination and Over-Thinking in Complex Tasks 1528

As large reasoning models (LRMs) like OpenAI’s o1 demonstrate unprecedented capabilities in math, code, and planning, a critical gap remains:…

12/19/2025Agentic Planning, Chain-of-thought Distillation, Reasoning Models
DragDiffusion: Precise, Interactive Image Editing for Real and AI-Generated Photos Using Diffusion Models

DragDiffusion: Precise, Interactive Image Editing for Real and AI-Generated Photos Using Diffusion Models 1234

DragDiffusion is an open-source framework that brings pixel-precise, point-based image manipulation to both real-world photographs and AI-generated images—without requiring users…

12/19/2025Diffusion Models, Image Editing, Interactive Manipulation
OmniGen: One Unified Model for All Image Generation Tasks—No Plugins, No Preprocessing, Just Prompts

OmniGen: One Unified Model for All Image Generation Tasks—No Plugins, No Preprocessing, Just Prompts 4282

Modern image generation is powerful—but fragmented. Depending on your goal—generating from text, editing existing images, preserving a person’s identity, or…

12/19/2025Image Editing, Subject-driven Generation, Text-to-Image Generation
AniPortrait: Generate Photorealistic Talking-Head Videos from a Single Image and Audio Clip

AniPortrait: Generate Photorealistic Talking-Head Videos from a Single Image and Audio Clip 5006

Creating lifelike, animated human faces used to require complex pipelines—motion capture rigs, professional voice actors, or hours of post-production. But…

12/19/2025Audio-driven Animation, Face Reenactment, Portrait Animation
GaussianObject: High-Quality 3D Reconstruction from Just Four Images—No COLMAP Required

GaussianObject: High-Quality 3D Reconstruction from Just Four Images—No COLMAP Required 1120

Creating photorealistic 3D models of real-world objects typically demands dozens—or even hundreds—of input images captured from carefully calibrated viewpoints. This…

12/19/20253D Object Reconstruction, Gaussian Splatting, Sparse-view Synthesis
AM-RADIO: Unify Vision Foundation Models into One High-Performance Backbone for Multimodal, Segmentation, and Detection Tasks

AM-RADIO: Unify Vision Foundation Models into One High-Performance Backbone for Multimodal, Segmentation, and Detection Tasks 1357

In modern computer vision, practitioners often juggle multiple foundation models—CLIP for vision-language alignment, DINOv2 for dense feature extraction, and SAM…

12/19/2025Object Detection, Semantic Segmentation, Vision-language Understanding
Semantic Operators: Declarative, Fast, and Accurate AI-Powered Data Processing for Unstructured and Structured Data

Semantic Operators: Declarative, Fast, and Accurate AI-Powered Data Processing for Unstructured and Structured Data 1484

Processing unstructured data—like free-form text, documents, or multimodal inputs—with large language models (LLMs) has become essential across industries, from biomedical…

12/19/2025LLM-powered Analytics, Semantic Data Processing, Unstructured Data Transformation
NeedleBench: Rigorously Evaluate LLM Retrieval and Reasoning in Long-Context Scenarios

NeedleBench: Rigorously Evaluate LLM Retrieval and Reasoning in Long-Context Scenarios 6409

Evaluating how well large language models (LLMs) retrieve critical facts and perform reasoning over long documents remains a major challenge…

12/19/2025Complex Reasoning, Long-context Retrieval, Synthetic Benchmarking

Posts pagination

Previous 1 … 35 36 37 … 53 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex