Skip to content

PaperCodex

Subscribe

Video Understanding

LiteFlowNet: High-Accuracy Optical Flow Estimation with a Lightweight, Fast CNN for Real-World Applications

LiteFlowNet: High-Accuracy Optical Flow Estimation with a Lightweight, Fast CNN for Real-World Applications 623

Optical flow estimation—the task of predicting per-pixel motion between consecutive video frames—is foundational in computer vision applications ranging from autonomous…

01/13/2026Motion Analysis, Optical Flow Estimation, Video Understanding
VideoMamba: Efficient Long- and Short-Term Video Understanding Without the Compute Overhead

VideoMamba: Efficient Long- and Short-Term Video Understanding Without the Compute Overhead 1044

Video understanding has long been bottlenecked by two competing demands: capturing fine-grained local motion while simultaneously modeling long-range temporal dependencies.…

12/26/2025Action Recognition, Video Understanding, Video-text Retrieval
Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos

Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos 1809

In today’s AI landscape, developers and researchers often juggle separate models for vision, language, and video—each with its own architecture,…

12/18/2025Image Generation, Multimodal Understanding, Video Understanding
Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI

Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI 1444

Video-ChatGPT is a state-of-the-art multimodal AI system that bridges the gap between video content and human-like conversation. Built by researchers…

12/17/2025Multimodal Dialogue, Video Question Answering, Video Understanding
VideoRAG: Unlock Long-Form Video Understanding with Retrieval-Augmented Generation for AI-Powered Insights

VideoRAG: Unlock Long-Form Video Understanding with Retrieval-Augmented Generation for AI-Powered Insights 1356

Imagine being able to ask questions like “What did the professor say about quantum entanglement in Lecture 3?” or “Show…

12/17/2025Multimodal Reasoning, Retrieval-Augmented Generation, Video Understanding
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex