Skip to content

PaperCodex

Subscribe
StableVideo: Text-Driven Video Editing with Frame-to-Frame Consistency

StableVideo: Text-Driven Video Editing with Frame-to-Frame Consistency 1444

Editing objects in existing videos while preserving their appearance across time has long been a challenge for diffusion-based models. While…

12/18/2025Temporal Consistency, Text-to-Video Generation, Video Editing
ElizaOS: The Web3-Friendly AI Agent Framework That Just Works

ElizaOS: The Web3-Friendly AI Agent Framework That Just Works 17177

In today’s fast-evolving landscape of artificial intelligence and decentralized systems, developers increasingly need tools that bridge the gap between large…

12/18/2025Autonomous AI Agents, Multi-agent Systems, Web3 Integration
ComfyUI-R1: Automate Complex AI Art Workflows with Reasoning-Powered Generation and Debugging

ComfyUI-R1: Automate Complex AI Art Workflows with Reasoning-Powered Generation and Debugging 3890

Building visual AI workflows in ComfyUI offers immense creative flexibility—but mastering its node-based interface demands significant expertise. Users often struggle…

12/18/2025Automated Debugging, Parameter Optimization, Workflow Generation
Paper2Video: Automatically Turn Scientific Papers into Ready-to-Use Presentation Videos

Paper2Video: Automatically Turn Scientific Papers into Ready-to-Use Presentation Videos 1860

Creating high-quality academic presentation videos is notoriously time-consuming. Researchers often spend hours designing slides, recording voiceovers, editing footage, and syncing…

12/17/2025Academic Video Automation, Automatic Video Generation, Multimodal Presentation Synthesis
DB-GPT: Secure, AI-Native Database Interaction with Private LLMs and Natural Language Queries

DB-GPT: Secure, AI-Native Database Interaction with Private LLMs and Natural Language Queries 17786

In today’s data-driven world, organizations are drowning in information—but starving for insights. Traditional database interfaces demand technical SQL knowledge, creating…

12/17/2025Generative Business Intelligence, Retrieval-Augmented Generation (RAG), Text-to-SQL
LivePortrait: Real-Time, Controllable Portrait Animation Without Diffusion Models

LivePortrait: Real-Time, Controllable Portrait Animation Without Diffusion Models 17443

Animating a static portrait—whether a photo of a person or a pet—into a lifelike, expressive video has long been a…

12/17/2025Facial Reenactment, Motion Retargeting, Portrait Animation
FaceChain: Generate Identity-Preserving AI Portraits in Seconds—No Training Required

FaceChain: Generate Identity-Preserving AI Portraits in Seconds—No Training Required 9493

Creating realistic, personalized human portraits with AI has long been plagued by distorted features, poor identity retention, and complex workflows…

12/17/2025Identity-Preserving Image Generation, Personalized Text-to-Image Synthesis, Train-Free Face Adaptation
ScreenCoder: Automate UI-to-Code Conversion from Screenshots with Modular Multimodal Agents

ScreenCoder: Automate UI-to-Code Conversion from Screenshots with Modular Multimodal Agents 2516

Transforming visual UI designs into functional front-end code has long been a bottleneck in software development. Designers craft mockups in…

12/17/2025Front-end Automation, Multimodal UI Understanding, Visual-to-code Generation
MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini

MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini 1637

MiMo is a 7-billion-parameter language model purpose-built for reasoning-intensive tasks—spanning mathematics, code generation, and STEM problem solving—without the computational overhead…

12/17/2025Code Generation, Mathematical Reasoning, Reinforcement Learning For Reasoning
OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation

OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation 3962

OmniGen2 is an open-source, unified generative model that seamlessly bridges text and vision in a single architecture. Unlike many multimodal…

12/17/2025In-context Generation, Instruction-guided Image Editing, Text-to-Image Generation

Posts pagination

Previous 1 … 45 46 47 … 53 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex