Awesome Video Question Answering Papers and Source Codes

Chat-UniVi: One Unified Model for Image and Video Understanding—No More Separate Systems Needed 939

In today’s AI landscape, multimodal systems that understand both images and videos are increasingly essential—but most solutions force you to…

01/13/2026Multimodal Understanding, Video Question Answering, Visual Reasoning

InternVideo: Build Powerful Video-Language AI Without Massive Compute or Data 2131

Building capable video-language AI systems has long been a resource-intensive endeavor—requiring vast video datasets, weeks of training on dozens of…

12/27/2025Video Question Answering, Video-text Retrieval, Zero-shot Video Classification

Video-ChatGPT: Enable Accurate, Detailed Video Understanding with Multimodal Conversational AI 1444

Video-ChatGPT is a state-of-the-art multimodal AI system that bridges the gap between video content and human-like conversation. Built by researchers…

12/17/2025Multimodal Dialogue, Video Question Answering, Video Understanding