Skip to content

PaperCodex

Subscribe
Moshi: A Real-Time, Full-Duplex Speech-to-Speech Foundation Model for Natural Human-Like Dialogue

Moshi: A Real-Time, Full-Duplex Speech-to-Speech Foundation Model for Natural Human-Like Dialogue 9165

Traditional spoken dialogue systems—like those used in virtual assistants or customer service bots—rely on a cascade of disconnected components: voice…

12/11/2025Full-duplex Dialogue, Speech-to-speech Generation, Spoken Language Modeling
Spark-TTS: Zero-Shot, Controllable Text-to-Speech with a Single LLM—No Vocoder, No Flow Matching

Spark-TTS: Zero-Shot, Controllable Text-to-Speech with a Single LLM—No Vocoder, No Flow Matching 10840

Overview In the rapidly evolving landscape of AI-powered speech synthesis, complexity has long been the price of quality. Traditional text-to-speech…

12/11/2025Controllable Speech Generation, Text-to-Speech Synthesis, Zero-Shot Voice Cloning
Trae Agent: Resolve Real-World Software Issues with LLM-Powered, Repository-Aware AI Automation

Trae Agent: Resolve Real-World Software Issues with LLM-Powered, Repository-Aware AI Automation 10232

Overview Software engineering is increasingly becoming a collaboration between humans and intelligent tools. Yet, many developers still face persistent challenges:…

12/11/2025LLM-based Agent Reasoning, Repository-level Code Understanding, Software Issue Resolution
Wan: Open-Source, High-Performance Video Generation That Runs on Consumer GPUs

Wan: Open-Source, High-Performance Video Generation That Runs on Consumer GPUs 14878

Overview Video content is no longer a luxury—it’s a necessity. From dynamic marketing campaigns and immersive educational materials to personalized…

12/11/202512/11/2025Image-to-Video Synthesis, Text-to-Video Generation, Video Editing
Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash

Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash 1954

Overview Step1X-Edit is a state-of-the-art open-source framework for general-purpose image editing that delivers performance comparable to leading proprietary models like…

12/11/2025Image Editing, Instruction-following Image Generation, Multimodal Reasoning
RLFactory: Plug-and-Play Reinforcement Learning for Multi-Turn LLM Tool Use Without the Complexity

RLFactory: Plug-and-Play Reinforcement Learning for Multi-Turn LLM Tool Use Without the Complexity 1647

Overview Training large language models (LLMs) to reliably use external tools over multiple conversation turns is a persistent challenge in…

12/11/2025LLM Post-Training, Multi-Turn Agent Training, Reinforcement Learning for Tool Use
EvoAgentX: Automate, Evolve, and Scale Multi-Agent LLM Workflows Without Manual Orchestration

EvoAgentX: Automate, Evolve, and Scale Multi-Agent LLM Workflows Without Manual Orchestration 2366

Overview Building reliable, scalable systems with large language models (LLMs) often involves stitching together multiple agents, tools, and prompts—a process…

12/11/2025Agentic Workflows, Evolutionary Optimization, Multi-agent Systems
Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization

Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization 8663

Overview Imagine an AI agent that can sit at your computer, look at the screen, understand what it sees, and…

12/11/2025Computer Use Agent, GUI Automation, Multimodal Reasoning
InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required

InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required 1044

Creating personalized, visually consistent characters is a common need across gaming, animation, virtual avatars, and digital storytelling—but until recently, doing…

12/11/202512/15/2025Character Personalization, Diffusion Transformer Adaptation, Text-to-Image Generation
Kronos: The First Open-Source Foundation Model Built Specifically for Financial Candlestick Forecasting, Volatility Estimation, and Synthetic Market Generation

Kronos: The First Open-Source Foundation Model Built Specifically for Financial Candlestick Forecasting, Volatility Estimation, and Synthetic Market Generation 9479

In the era of foundation models, most time series approaches have been adapted from general-purpose architectures originally designed for language…

12/11/202512/15/2025Financial Time Series Forecasting, Synthetic Financial Data Generation, Volatility Prediction

Posts pagination

Previous 1 … 36 37 38 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex