Skip to content

PaperCodex

Subscribe
IterResearch: Break Through Long-Horizon Reasoning Limits with Markovian State Reconstruction

IterResearch: Break Through Long-Horizon Reasoning Limits with Markovian State Reconstruction 17551

Long-horizon reasoning is one of the toughest challenges in current AI agent development. Traditional agentic systems, which rely on steadily…

12/18/2025Agentic Search, Iterative Deep Research, Long-horizon Reasoning
CARAFE: Boost Dense Prediction Accuracy with Content-Aware, Lightweight Feature Upsampling

CARAFE: Boost Dense Prediction Accuracy with Content-Aware, Lightweight Feature Upsampling 32164

Feature upsampling is a critical but often overlooked component in modern computer vision pipelines. Whether you’re building an object detector,…

12/18/2025Instance Segmentation, Object Detection, Semantic Segmentation
MNN: Run Large Language Models and Vision AI Offline on Mobile with a Lightweight, High-Performance Inference Engine

MNN: Run Large Language Models and Vision AI Offline on Mobile with a Lightweight, High-Performance Inference Engine 13694

Mobile Neural Network (MNN) is an open-source, lightweight deep learning inference engine developed by Alibaba Group to bring powerful AI…

12/18/2025Large Language Model Deployment, Multimodal AI, On-device Inference
The Well: 15TB of Diverse Physics Simulations for Training and Benchmarking Surrogate Models in Scientific Machine Learning

The Well: 15TB of Diverse Physics Simulations for Training and Benchmarking Surrogate Models in Scientific Machine Learning 1582

If you’re working on machine learning models that aim to emulate or accelerate physics-based simulations—whether in fluid dynamics, astrophysics, or…

12/18/2025Scientific Machine Learning, Spatiotemporal Physics Simulation, Surrogate Modeling
FastVLM: High-Resolution Vision-Language Inference with 85× Faster Time-to-First-Token and Minimal Compute Overhead

FastVLM: High-Resolution Vision-Language Inference with 85× Faster Time-to-First-Token and Minimal Compute Overhead 7052

Vision Language Models (VLMs) are increasingly central to real-world applications—from mobile assistants that read documents to AI systems that interpret…

12/18/2025Document Understanding, On-Device Multimodal Inference, vision-language modeling
Step-Audio: Unified Speech Understanding and Generation for Real-World Voice Applications

Step-Audio: Unified Speech Understanding and Generation for Real-World Voice Applications 4571

Building intelligent voice interfaces used to mean stitching together separate speech recognition (ASR), text generation, and text-to-speech (TTS) systems—each with…

12/18/2025Multimodal Language Modeling, Speech Generation, Speech Understanding
DreamTalk: Generate Emotionally Expressive Talking Heads from Audio Using Diffusion Models

DreamTalk: Generate Emotionally Expressive Talking Heads from Audio Using Diffusion Models 1767

Creating lifelike digital avatars that speak naturally with accurate lip movements and rich emotional expression has long been a challenge…

12/18/2025Audio-to-Video Generation, Emotion-Aware Synthesis, Talking Head Generation
SkyThought: Boost Code Generation Accuracy Without Retraining—Even Small Models Beat GPT-4o-mini

SkyThought: Boost Code Generation Accuracy Without Retraining—Even Small Models Beat GPT-4o-mini 3358

SkyThought is an open-source framework built around S*—a breakthrough test-time scaling approach designed specifically to elevate code generation performance in…

12/18/2025Code Generation, Program Synthesis, Test-time Scaling
AniTalker: Generate Lifelike, Expressive Talking Faces from a Single Image and Audio Clip

AniTalker: Generate Lifelike, Expressive Talking Faces from a Single Image and Audio Clip 1588

Imagine turning a static portrait—like the Mona Lisa or a headshot from your LinkedIn profile—into a vivid, talking avatar that…

12/18/2025Facial Animation, Identity-Decoupled Motion Modeling, Talking Face Generation
Colossal-Auto: Automate Large Model Training with Zero Expertise in Parallelization or Checkpointing

Colossal-Auto: Automate Large Model Training with Zero Expertise in Parallelization or Checkpointing 41290

Training large-scale AI models—whether language models like LLaMA or video generators like Open-Sora—has become increasingly common, yet remains bottlenecked by…

12/18/2025Distributed Deep Learning, Large Language Model Training, Video Generation Model Training

Posts pagination

Previous 1 … 31 32 33 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex