Skip to content

PaperCodex

Subscribe
DB-GPT: Secure, AI-Native Database Interaction with Private LLMs and Natural Language Queries

DB-GPT: Secure, AI-Native Database Interaction with Private LLMs and Natural Language Queries 17786

In today’s data-driven world, organizations are drowning in information—but starving for insights. Traditional database interfaces demand technical SQL knowledge, creating…

12/17/2025Generative Business Intelligence, Retrieval-Augmented Generation (RAG), Text-to-SQL
LivePortrait: Real-Time, Controllable Portrait Animation Without Diffusion Models

LivePortrait: Real-Time, Controllable Portrait Animation Without Diffusion Models 17443

Animating a static portrait—whether a photo of a person or a pet—into a lifelike, expressive video has long been a…

12/17/2025Facial Reenactment, Motion Retargeting, Portrait Animation
FaceChain: Generate Identity-Preserving AI Portraits in Seconds—No Training Required

FaceChain: Generate Identity-Preserving AI Portraits in Seconds—No Training Required 9493

Creating realistic, personalized human portraits with AI has long been plagued by distorted features, poor identity retention, and complex workflows…

12/17/2025Identity-Preserving Image Generation, Personalized Text-to-Image Synthesis, Train-Free Face Adaptation
ScreenCoder: Automate UI-to-Code Conversion from Screenshots with Modular Multimodal Agents

ScreenCoder: Automate UI-to-Code Conversion from Screenshots with Modular Multimodal Agents 2516

Transforming visual UI designs into functional front-end code has long been a bottleneck in software development. Designers craft mockups in…

12/17/2025Front-end Automation, Multimodal UI Understanding, Visual-to-code Generation
MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini

MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini 1637

MiMo is a 7-billion-parameter language model purpose-built for reasoning-intensive tasks—spanning mathematics, code generation, and STEM problem solving—without the computational overhead…

12/17/2025Code Generation, Mathematical Reasoning, Reinforcement Learning For Reasoning
OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation

OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation 3962

OmniGen2 is an open-source, unified generative model that seamlessly bridges text and vision in a single architecture. Unlike many multimodal…

12/17/2025In-context Generation, Instruction-guided Image Editing, Text-to-Image Generation
Ovis: Align Vision and Language Embeddings for Superior Multimodal Reasoning Without Proprietary Lock-in

Ovis: Align Vision and Language Embeddings for Superior Multimodal Reasoning Without Proprietary Lock-in 1373

Multimodal Large Language Models (MLLMs) are increasingly vital for tasks that bridge vision and language—yet many struggle to truly fuse…

12/17/2025Multimodal Fine-tuning, Multimodal Reasoning, Vision-language Alignment
Parallax: Run LLMs on Decentralized Devices Without Costly GPU Clusters

Parallax: Run LLMs on Decentralized Devices Without Costly GPU Clusters 1004

Deploying large language models (LLMs) today often means relying on expensive, centralized infrastructure—specialized GPU clusters, high-bandwidth data centers, and recurring…

12/17/2025Decentralized Inference, Edge AI, Large Language Model Serving
Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI

Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI 1444

Video-ChatGPT is a state-of-the-art multimodal AI system that bridges the gap between video content and human-like conversation. Built by researchers…

12/17/2025Multimodal Dialogue, Video Question Answering, Video Understanding
UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention

UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention 7659

Imagine telling your computer what you want it to do—like “Summarize this PDF, email the summary to my manager, and…

12/17/2025Cross-application Task Execution, GUI Automation, Multimodal Reasoning

Posts pagination

Previous 1 … 35 36 37 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex