Skip to content

PaperCodex

Subscribe
InternLM-XComposer: Generate Rich Text-Image Content and Understand High-Res Visuals with Open, Commercially Free AI

InternLM-XComposer: Generate Rich Text-Image Content and Understand High-Res Visuals with Open, Commercially Free AI 2909

Overview For technical decision makers evaluating multimodal AI, choosing between closed-source APIs and open alternatives often means trading off control,…

12/22/2025Multimodal Understanding, Text-image Composition, vision-language modeling
Show-1: High-Quality, Efficient Text-to-Video Generation with Precise Prompt Alignment

Show-1: High-Quality, Efficient Text-to-Video Generation with Precise Prompt Alignment 1133

Text-to-video generation has rapidly evolved, yet technical teams still face a persistent trade-off: high-quality outputs often come at prohibitive computational…

12/22/2025Diffusion Models, Text-to-Video Generation, Video Synthesis
TinyLlama: A Fast, Efficient 1.1B Open Language Model for Edge Deployment and Speculative Decoding

TinyLlama: A Fast, Efficient 1.1B Open Language Model for Edge Deployment and Speculative Decoding 8770

TinyLlama is a compact yet powerful open-source language model with just 1.1 billion parameters—but trained on an impressive 3 trillion…

12/22/2025On-device Inference, Speculative Decoding, Text Generation
SLAM3R: Real-Time Dense 3D Reconstruction from Monocular Video—No Camera Calibration Needed

SLAM3R: Real-Time Dense 3D Reconstruction from Monocular Video—No Camera Calibration Needed 1045

Introducing SLAM3R—a cutting-edge, end-to-end system that reconstructs high-quality, dense 3D scenes in real time using only a monocular RGB video…

12/22/20253D Reconstruction, Monocular SLAM, Neural Scene Representation
3DGUT: Real-Time 3D Reconstruction That Handles Distorted Cameras and Reflections Without Sacrificing Speed

3DGUT: Real-Time 3D Reconstruction That Handles Distorted Cameras and Reflections Without Sacrificing Speed 1743

3D Gaussian Splatting (3DGS) revolutionized real-time 3D scene reconstruction by delivering photorealistic quality at high frame rates on consumer GPUs.…

12/22/20253D Reconstruction, Neural Rendering, Real-time Rendering
LLaVA-CoT: Step-by-Step Visual Reasoning for Reliable, Explainable Multimodal AI

LLaVA-CoT: Step-by-Step Visual Reasoning for Reliable, Explainable Multimodal AI 2108

Most vision-language models (VLMs) today can describe what’s in an image—but they often falter when asked to reason about it.…

12/22/2025Explainable AI, Multimodal Reasoning, Visual Question Answering
TEQ: Accurate 3- and 4-Bit LLM Quantization Without Inference Overhead

TEQ: Accurate 3- and 4-Bit LLM Quantization Without Inference Overhead 2544

Deploying large language models (LLMs) in production often runs into a hard trade-off: reduce model size and latency through quantization,…

12/22/2025Efficient LLM Inference, Large Language Model Quantization, Weight-only Quantization
YOLOv9: Train-from-Scratch Object Detection That Beats Pretrained Models with Programmable Gradient Information

YOLOv9: Train-from-Scratch Object Detection That Beats Pretrained Models with Programmable Gradient Information 9391

YOLOv9 marks a significant leap forward in real-time object detection by directly confronting a long-standing but often overlooked problem in…

12/22/2025Instance Segmentation, Object Detection, Panoptic Segmentation
Mulberry: Step-by-Step Multimodal Reasoning with o1-Like Reflection for Trustworthy AI Decisions

Mulberry: Step-by-Step Multimodal Reasoning with o1-Like Reflection for Trustworthy AI Decisions 1217

Traditional multimodal large language models (MLLMs) often produce answers without revealing how they got there—especially when dealing with complex questions…

12/22/2025Interpretable AI, Multimodal Reasoning, Visual Question Answering
ELF: Train Real-Time Strategy AI Bots 10x Faster with a Lightweight, Flexible RL Platform

ELF: Train Real-Time Strategy AI Bots 10x Faster with a Lightweight, Flexible RL Platform 2094

Reinforcement learning (RL) for real-time strategy (RTS) games has long been bottlenecked by slow simulation, rigid environment interfaces, and high…

12/22/2025Multi-Agent Training, Real-Time Strategy Game AI, Reinforcement Learning

Posts pagination

Previous 1 … 21 22 23 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex