In today’s data-driven world, organizations are drowning in information—but starving for insights. Traditional database interfaces demand technical SQL knowledge, creating…
LivePortrait: Real-Time, Controllable Portrait Animation Without Diffusion Models 17443
Animating a static portrait—whether a photo of a person or a pet—into a lifelike, expressive video has long been a…
FaceChain: Generate Identity-Preserving AI Portraits in Seconds—No Training Required 9493
Creating realistic, personalized human portraits with AI has long been plagued by distorted features, poor identity retention, and complex workflows…
ScreenCoder: Automate UI-to-Code Conversion from Screenshots with Modular Multimodal Agents 2516
Transforming visual UI designs into functional front-end code has long been a bottleneck in software development. Designers craft mockups in…
MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini 1637
MiMo is a 7-billion-parameter language model purpose-built for reasoning-intensive tasks—spanning mathematics, code generation, and STEM problem solving—without the computational overhead…
OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation 3962
OmniGen2 is an open-source, unified generative model that seamlessly bridges text and vision in a single architecture. Unlike many multimodal…
Ovis: Align Vision and Language Embeddings for Superior Multimodal Reasoning Without Proprietary Lock-in 1373
Multimodal Large Language Models (MLLMs) are increasingly vital for tasks that bridge vision and language—yet many struggle to truly fuse…
Parallax: Run LLMs on Decentralized Devices Without Costly GPU Clusters 1004
Deploying large language models (LLMs) today often means relying on expensive, centralized infrastructure—specialized GPU clusters, high-bandwidth data centers, and recurring…
Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI 1444
Video-ChatGPT is a state-of-the-art multimodal AI system that bridges the gap between video content and human-like conversation. Built by researchers…
UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention 7659
Imagine telling your computer what you want it to do—like “Summarize this PDF, email the summary to my manager, and…