Awesome Temporal Modeling Papers and Source Codes

Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks 709

Video understanding has long been a bottleneck for multimodal large language models (MLLMs). While models can recognize objects or scenes…