Skip to content

PaperCodex

Subscribe
MARS: Accelerate Large Model Training with Variance-Reduced Optimization That Actually Works

MARS: Accelerate Large Model Training with Variance-Reduced Optimization That Actually Works 712

Training large language models and vision architectures is notoriously slow, unstable, and expensive. Practitioners routinely face diminishing returns from standard…

01/13/2026Large Language Model Training, Variance-reduced Optimization, Vision Model Optimization
MarkLLM: Open-Source Toolkit for Detectable, Invisible Watermarks in LLM-Generated Text

MarkLLM: Open-Source Toolkit for Detectable, Invisible Watermarks in LLM-Generated Text 632

As large language models (LLMs) become deeply embedded in enterprise workflows, content platforms, and research pipelines, the ability to verify…

01/13/2026AI-generated Text Detection, Content Provenance Verification, LLM Watermarking
Uni-MoE: Build One Unified Multimodal AI Instead of Five Separate Models

Uni-MoE: Build One Unified Multimodal AI Instead of Five Separate Models 773

Imagine managing a project that needs to understand speech, analyze images, interpret video frames, and respond to written prompts—all within…

01/13/2026Instruction Tuning, Mixture-of-Experts, Multimodal Learning
SocialED: Detect Real-World Events from Social Media with One Unified, Production-Ready Python Library

SocialED: Detect Real-World Events from Social Media with One Unified, Production-Ready Python Library 586

In today’s fast-paced digital landscape, real-time awareness of emerging events—from natural disasters and political rallies to viral misinformation—is critical for…

01/13/2026Graph Neural Networks For NLP, Multilingual Anomaly Detection, Social Event Detection
OpenEMMA: Open-Source End-to-End Autonomous Driving with Multimodal Reasoning and Transparent Planning

OpenEMMA: Open-Source End-to-End Autonomous Driving with Multimodal Reasoning and Transparent Planning 873

Autonomous driving research has long been bottlenecked by the need for massive datasets, expensive compute infrastructure, and proprietary end-to-end frameworks.…

01/13/2026End-to-End Autonomous Driving, Multimodal Reasoning, Vision-language Models
IDRNet: Boost Semantic Segmentation Accuracy with Smarter Context Modeling—No Heavy Priors Required

IDRNet: Boost Semantic Segmentation Accuracy with Smarter Context Modeling—No Heavy Priors Required 876

If you’re building computer vision systems that rely on pixel-perfect understanding—like autonomous driving, medical imaging analysis, or retail scene parsing—you’ve…

01/13/2026Context Modeling, Dense Prediction, Semantic Segmentation
CCF: Build Secure Multi-Party Applications with Confidentiality, Integrity, and High Availability—Even on Untrusted Cloud Infrastructure

CCF: Build Secure Multi-Party Applications with Confidentiality, Integrity, and High Availability—Even on Untrusted Cloud Infrastructure 840

In today’s cloud-first world, organizations increasingly need to collaborate across trust boundaries—whether in finance, healthcare, supply chains, or regulatory compliance.…

01/13/2026Confidential Computing, Secure Multi-party Computation, Trusted Execution Environments
Arc2Face: Generate Identity-Consistent Faces with Precise Expression Control for AI Storytelling and Avatars

Arc2Face: Generate Identity-Consistent Faces with Precise Expression Control for AI Storytelling and Avatars 768

Creating realistic, diverse human faces that remain visually consistent with a specific identity—while allowing fine-grained control over expressions—is a persistent…

01/13/2026Expression Control, Face Generation, Identity-consistent Synthesis
MINS: Robust, Efficient Multisensor Fusion for Reliable Autonomous Navigation

MINS: Robust, Efficient Multisensor Fusion for Reliable Autonomous Navigation 632

In the world of autonomous systems—whether robots, drones, or self-driving vehicles—accurate and reliable state estimation is non-negotiable. Yet real-world deployments…

01/13/2026Autonomous Navigation, Sensor Fusion, State Estimation
LanguageBind: Unify Video, Audio, Depth, Thermal & Text in One Language-Aligned Multimodal Space

LanguageBind: Unify Video, Audio, Depth, Thermal & Text in One Language-Aligned Multimodal Space 833

Imagine building an AI system that understands not just images and text—but also video, audio, infrared (thermal), and depth data—all…

01/13/2026Cross-Modal Retrieval, Multimodal Representation Learning, Zero-shot Transfer Learning

Posts pagination

Previous 1 … 7 8 9 … 53 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex