Editing objects in existing videos while preserving their appearance across time has long been a challenge for diffusion-based models. While…
ElizaOS: The Web3-Friendly AI Agent Framework That Just Works 17177
In today’s fast-evolving landscape of artificial intelligence and decentralized systems, developers increasingly need tools that bridge the gap between large…
ComfyUI-R1: Automate Complex AI Art Workflows with Reasoning-Powered Generation and Debugging 3890
Building visual AI workflows in ComfyUI offers immense creative flexibility—but mastering its node-based interface demands significant expertise. Users often struggle…
Paper2Video: Automatically Turn Scientific Papers into Ready-to-Use Presentation Videos 1860
Creating high-quality academic presentation videos is notoriously time-consuming. Researchers often spend hours designing slides, recording voiceovers, editing footage, and syncing…
DB-GPT: Secure, AI-Native Database Interaction with Private LLMs and Natural Language Queries 17786
In today’s data-driven world, organizations are drowning in information—but starving for insights. Traditional database interfaces demand technical SQL knowledge, creating…
LivePortrait: Real-Time, Controllable Portrait Animation Without Diffusion Models 17443
Animating a static portrait—whether a photo of a person or a pet—into a lifelike, expressive video has long been a…
FaceChain: Generate Identity-Preserving AI Portraits in Seconds—No Training Required 9493
Creating realistic, personalized human portraits with AI has long been plagued by distorted features, poor identity retention, and complex workflows…
ScreenCoder: Automate UI-to-Code Conversion from Screenshots with Modular Multimodal Agents 2516
Transforming visual UI designs into functional front-end code has long been a bottleneck in software development. Designers craft mockups in…
MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini 1637
MiMo is a 7-billion-parameter language model purpose-built for reasoning-intensive tasks—spanning mathematics, code generation, and STEM problem solving—without the computational overhead…
OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation 3962
OmniGen2 is an open-source, unified generative model that seamlessly bridges text and vision in a single architecture. Unlike many multimodal…