Multimodal Large Language Models (MLLMs) are increasingly vital for tasks that bridge vision and language—yet many struggle to truly fuse…
Parallax: Run LLMs on Decentralized Devices Without Costly GPU Clusters 1004
Deploying large language models (LLMs) today often means relying on expensive, centralized infrastructure—specialized GPU clusters, high-bandwidth data centers, and recurring…
Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI 1444
Video-ChatGPT is a state-of-the-art multimodal AI system that bridges the gap between video content and human-like conversation. Built by researchers…
UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention 7659
Imagine telling your computer what you want it to do—like “Summarize this PDF, email the summary to my manager, and…
SALMONN-omni: A Standalone Full-Duplex Speech LLM That Enables Natural, Codec-Free Voice Conversations 1366
Building truly natural voice interfaces has long been a holy grail in AI—yet most current systems fall short when it…
PaSa: Autonomous Academic Paper Search Agent That Finds More Relevant Papers Than Google Scholar or ChatGPT 1457
Searching for academic papers is a daily reality for researchers, engineers, and students—but traditional tools often fall short. Google Scholar…
HealthGPT: Unified Medical Vision-Language Understanding and Generation in a Single Model 1567
HealthGPT is a cutting-edge Medical Large Vision-Language Model (Med-LVLM) designed to tackle a long-standing challenge in AI for healthcare: the…
UNetFormer: Real-Time, High-Accuracy Semantic Segmentation for Urban Remote Sensing Imagery 1007
Semantic segmentation of urban remote sensing imagery—such as aerial photos from drones or satellites—is essential for applications like land cover…
Hunyuan3D 2.0: Open-Source High-Resolution Textured 3D Generation from Images and Text 12640
Hunyuan3D 2.0 is a powerful, open-source system developed by Tencent for generating high-resolution, textured 3D assets from either images or…
AniSora: The First Open-Source Animation Video Generator Built Specifically for Anime-Style Motion and Consistency 2283
While general-purpose video generation models like Sora, Kling, and CogVideoX have revolutionized photorealistic video synthesis, they consistently underperform when it…