Imagine giving a natural language instruction like “Book a round-trip flight from Beijing to Paris on Skyscanner for September 18–21”…
Moshi: A Real-Time, Full-Duplex Speech-to-Speech Foundation Model for Natural Human-Like Dialogue 9165
Traditional spoken dialogue systems—like those used in virtual assistants or customer service bots—rely on a cascade of disconnected components: voice…
Spark-TTS: Zero-Shot, Controllable Text-to-Speech with a Single LLM—No Vocoder, No Flow Matching 10840
Overview In the rapidly evolving landscape of AI-powered speech synthesis, complexity has long been the price of quality. Traditional text-to-speech…
Trae Agent: Resolve Real-World Software Issues with LLM-Powered, Repository-Aware AI Automation 10232
Overview Software engineering is increasingly becoming a collaboration between humans and intelligent tools. Yet, many developers still face persistent challenges:…
Wan: Open-Source, High-Performance Video Generation That Runs on Consumer GPUs 14878
Overview Video content is no longer a luxury—it’s a necessity. From dynamic marketing campaigns and immersive educational materials to personalized…
Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash 1954
Overview Step1X-Edit is a state-of-the-art open-source framework for general-purpose image editing that delivers performance comparable to leading proprietary models like…
RLFactory: Plug-and-Play Reinforcement Learning for Multi-Turn LLM Tool Use Without the Complexity 1647
Overview Training large language models (LLMs) to reliably use external tools over multiple conversation turns is a persistent challenge in…
EvoAgentX: Automate, Evolve, and Scale Multi-Agent LLM Workflows Without Manual Orchestration 2366
Overview Building reliable, scalable systems with large language models (LLMs) often involves stitching together multiple agents, tools, and prompts—a process…
Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization 8663
Overview Imagine an AI agent that can sit at your computer, look at the screen, understand what it sees, and…
InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required 1044
Creating personalized, visually consistent characters is a common need across gaming, animation, virtual avatars, and digital storytelling—but until recently, doing…