Skip to content

PaperCodex

Subscribe
Wan: Open-Source, High-Performance Video Generation That Runs on Consumer GPUs

Wan: Open-Source, High-Performance Video Generation That Runs on Consumer GPUs 14878

Overview Video content is no longer a luxury—it’s a necessity. From dynamic marketing campaigns and immersive educational materials to personalized…

12/11/202512/11/2025Image-to-Video Synthesis, Text-to-Video Generation, Video Editing
Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash

Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash 1954

Overview Step1X-Edit is a state-of-the-art open-source framework for general-purpose image editing that delivers performance comparable to leading proprietary models like…

12/11/2025Image Editing, Instruction-following Image Generation, Multimodal Reasoning
RLFactory: Plug-and-Play Reinforcement Learning for Multi-Turn LLM Tool Use Without the Complexity

RLFactory: Plug-and-Play Reinforcement Learning for Multi-Turn LLM Tool Use Without the Complexity 1647

Overview Training large language models (LLMs) to reliably use external tools over multiple conversation turns is a persistent challenge in…

12/11/2025LLM Post-Training, Multi-Turn Agent Training, Reinforcement Learning for Tool Use
EvoAgentX: Automate, Evolve, and Scale Multi-Agent LLM Workflows Without Manual Orchestration

EvoAgentX: Automate, Evolve, and Scale Multi-Agent LLM Workflows Without Manual Orchestration 2366

Overview Building reliable, scalable systems with large language models (LLMs) often involves stitching together multiple agents, tools, and prompts—a process…

12/11/2025Agentic Workflows, Evolutionary Optimization, Multi-agent Systems
Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization

Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization 8663

Overview Imagine an AI agent that can sit at your computer, look at the screen, understand what it sees, and…

12/11/2025Computer Use Agent, GUI Automation, Multimodal Reasoning
InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required

InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required 1044

Creating personalized, visually consistent characters is a common need across gaming, animation, virtual avatars, and digital storytelling—but until recently, doing…

12/11/202512/15/2025Character Personalization, Diffusion Transformer Adaptation, Text-to-Image Generation
Kronos: The First Open-Source Foundation Model Built Specifically for Financial Candlestick Forecasting, Volatility Estimation, and Synthetic Market Generation

Kronos: The First Open-Source Foundation Model Built Specifically for Financial Candlestick Forecasting, Volatility Estimation, and Synthetic Market Generation 9479

In the era of foundation models, most time series approaches have been adapted from general-purpose architectures originally designed for language…

12/11/202512/15/2025Financial Time Series Forecasting, Synthetic Financial Data Generation, Volatility Prediction
MonkeyOCR: High-Accuracy Document Parsing for Complex Layouts with Tables, Formulas, and Multilingual Text—Fast, Lightweight, and Deployable

MonkeyOCR: High-Accuracy Document Parsing for Complex Layouts with Tables, Formulas, and Multilingual Text—Fast, Lightweight, and Deployable 6354

Parsing complex documents—especially those containing tables, mathematical formulas, mixed layouts, or multilingual content—remains a persistent challenge in real-world AI applications.…

12/11/202512/15/2025Document Parsing, Optical Character Recognition (OCR), vision-language modeling
Easy Dataset: Turn PDFs, Docs, and Wikis into High-Quality LLM Fine-Tuning Data Visually and Efficiently

Easy Dataset: Turn PDFs, Docs, and Wikis into High-Quality LLM Fine-Tuning Data Visually and Efficiently 12323

Large language models (LLMs) are remarkably capable—but they often stumble when applied to specialized domains like finance, legal, healthcare, or…

12/10/202512/15/2025Domain-specific Question Answering, LLM fine-tuning Data Synthesis, Structured Dataset Generation
WebDancer: Build Autonomous Web Agents That Solve Complex, Multi-Step Research Tasks

WebDancer: Build Autonomous Web Agents That Solve Complex, Multi-Step Research Tasks 17544

Most large language models today give one-shot answers—but real-world problems rarely fit into a single prompt. Imagine trying to answer:…

12/10/202512/15/2025Autonomous Information Seeking, Multi-step Research Automation, Web-based Reasoning Agents

Posts pagination

Previous 1 … 41 42 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex