Building speech language models (SpeechLMs)—systems that jointly understand and generate both speech and text—is rapidly becoming essential for next-generation voice…
FinGPT: Open-Source Financial LLMs with Transparent, Global Data Pipelines for Real-World Finance Applications 1284
Large language models (LLMs) are transforming how we interact with data—but in finance, high-quality, domain-specific language models have largely remained…
DeepSeek-VL2: High-Performance Vision-Language Understanding with Efficient Mixture-of-Experts Architecture 5072
DeepSeek-VL2 is an open-source, advanced vision-language model (VLM) built on a Mixture-of-Experts (MoE) architecture, engineered for robust multimodal understanding across…
OmniSafe: Accelerate Safe Reinforcement Learning Research with a Unified, Modular Framework 1031
Reinforcement learning (RL) holds transformative potential for real-world applications—from autonomous vehicles and surgical robots to industrial control systems. Yet, one…
EliGen: Achieve Precise Entity-Level Control in AI Image Generation Without Retraining Models 11062
Text-to-image diffusion models have revolutionized creative workflows, but they still struggle with a fundamental limitation: global prompts alone often fail…
Mini-InternVL: Achieve 90% of Multimodal Performance with Just 5% of Model Size for Edge and Consumer Deployments 9328
In an era where multimodal large language models (MLLMs) are rapidly advancing, a critical barrier remains: most high-performing vision-language models…
AnimateDiff: Bring Your Custom AI Image Models to Life—Without Retraining 11796
If you’ve spent time fine-tuning a Stable Diffusion model—perhaps with DreamBooth or LoRA—to generate your ideal character, product mockup, or…
Seamless: Real-Time, Expressive, and Multilingual Speech Translation for Natural Cross-Language Communication 11720
In today’s globalized world, real-time communication across languages remains a major bottleneck. Traditional speech translation systems often fall short—they output…
Tora: Precisely Control Motion in AI-Generated Videos with Trajectory Guidance 1223
Creating videos with predictable, controllable motion has long been a major challenge in generative AI. While recent diffusion models produce…
Gymnasium: A Standardized, Reproducible Interface for Reinforcement Learning Environments 10396
Reinforcement learning (RL) holds immense promise for solving complex decision-making problems—from robotics and game playing to resource optimization and autonomous…