Optical flow estimation—the task of predicting per-pixel motion between consecutive video frames—is foundational in computer vision applications ranging from autonomous…
Video Understanding
VideoMamba: Efficient Long- and Short-Term Video Understanding Without the Compute Overhead 1044
Video understanding has long been bottlenecked by two competing demands: capturing fine-grained local motion while simultaneously modeling long-range temporal dependencies.…
Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos 1809
In today’s AI landscape, developers and researchers often juggle separate models for vision, language, and video—each with its own architecture,…
Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI 1444
Video-ChatGPT is a state-of-the-art multimodal AI system that bridges the gap between video content and human-like conversation. Built by researchers…
VideoRAG: Unlock Long-Form Video Understanding with Retrieval-Augmented Generation for AI-Powered Insights 1356
Imagine being able to ask questions like “What did the professor say about quantum entanglement in Lecture 3?” or “Show…