Skip to content

PaperCodex

Subscribe
TorchAO: Unified PyTorch-Native Optimization for Faster Training and Efficient LLM Inference

TorchAO: Unified PyTorch-Native Optimization for Faster Training and Efficient LLM Inference 2559

Deploying large AI models in production often involves a fragmented toolchain: one set of libraries for training, another for quantization,…

12/19/2025Efficient Inference, Large Language Model Optimization, Model Quantization
CodeGen: Open-Source LLMs That Generate Code from Natural Language—Smarter, Faster, and Free

CodeGen: Open-Source LLMs That Generate Code from Natural Language—Smarter, Faster, and Free 5157

In today’s fast-paced software development landscape, the ability to translate natural language instructions into functional code is no longer science…

12/19/2025Code Generation, Code Infilling, Program Synthesis
Attentive Reasoning Queries: Boost LLM Instruction-Following Accuracy in Business-Critical Applications

Attentive Reasoning Queries: Boost LLM Instruction-Following Accuracy in Business-Critical Applications 16725

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks—from answering questions to generating code. However,…

12/19/2025Instruction Following, Reliable LLM Agents, Structured Reasoning
MagicTime: Generate Realistic Time-Lapse Videos That Simulate Real-World Physical Transformations

MagicTime: Generate Realistic Time-Lapse Videos That Simulate Real-World Physical Transformations 1342

Most text-to-video (T2V) models today excel at generating short clips of people walking, cars driving, or birds flying—but they struggle…

12/18/2025Physical Simulation, Text-to-Video Generation, Time-lapse Video Synthesis
YOLOE: Real-Time Open-Vocabulary Object Detection and Segmentation Without Compromise

YOLOE: Real-Time Open-Vocabulary Object Detection and Segmentation Without Compromise 1939

Conventional object detectors like YOLOv8 are fast, reliable, and widely deployed—but they come with a critical limitation: they can only…

12/18/2025Open-vocabulary Object Detection, Prompt-based Vision, Real-time Instance Segmentation
CogAgent: Automate Any GUI with Vision—No Code or HTML Needed

CogAgent: Automate Any GUI with Vision—No Code or HTML Needed 1104

Imagine giving a natural language instruction like “Mark all unread emails as read” or “Filter Amazon search results to show…

12/18/2025GUI Automation, Vision-based Agent, Visual Language Modeling
MobileSAM: Ultra-Fast, Lightweight Image Segmentation for Real-World Applications

MobileSAM: Ultra-Fast, Lightweight Image Segmentation for Real-World Applications 5526

MobileSAM is a streamlined, high-performance variant of Meta’s groundbreaking Segment Anything Model (SAM), engineered to deliver the same powerful segmentation…

12/18/2025Image Segmentation, Promptable Segmentation, Zero-shot Object Detection
Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos

Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos 1809

In today’s AI landscape, developers and researchers often juggle separate models for vision, language, and video—each with its own architecture,…

12/18/2025Image Generation, Multimodal Understanding, Video Understanding
CleanRL: Readable, Reproducible, and Research-Ready Deep Reinforcement Learning in a Single File

CleanRL: Readable, Reproducible, and Research-Ready Deep Reinforcement Learning in a Single File 8496

If you’ve ever tried to understand how a deep reinforcement learning (DRL) algorithm truly works—only to get lost in layers…

12/18/2025Algorithm Prototyping, Deep Reinforcement Learning, Reproducible Research
AudioGPT: Build Spoken AI Experiences with Speech, Music, Sound, and Talking Head Generation in One Unified System

AudioGPT: Build Spoken AI Experiences with Speech, Music, Sound, and Talking Head Generation in One Unified System 10209

AudioGPT is a multimodal AI system that bridges the gap between large language models (LLMs) like ChatGPT and the rich…

12/18/2025Audio Generation, Multimodal AI, Speech Synthesis

Posts pagination

Previous 1 … 30 31 32 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex