Skip to content

PaperCodex

Subscribe

Visual Question Answering

Mulberry: Step-by-Step Multimodal Reasoning with o1-Like Reflection for Trustworthy AI Decisions

Mulberry: Step-by-Step Multimodal Reasoning with o1-Like Reflection for Trustworthy AI Decisions 1217

Traditional multimodal large language models (MLLMs) often produce answers without revealing how they got there—especially when dealing with complex questions…

12/22/2025Interpretable AI, Multimodal Reasoning, Visual Question Answering
DeepSeek-VL2: High-Performance Vision-Language Understanding with Efficient Mixture-of-Experts Architecture

DeepSeek-VL2: High-Performance Vision-Language Understanding with Efficient Mixture-of-Experts Architecture 5072

DeepSeek-VL2 is an open-source, advanced vision-language model (VLM) built on a Mixture-of-Experts (MoE) architecture, engineered for robust multimodal understanding across…

12/18/2025Document Understanding, Visual Grounding, Visual Question Answering
HealthGPT: Unified Medical Vision-Language Understanding and Generation in a Single Model

HealthGPT: Unified Medical Vision-Language Understanding and Generation in a Single Model 1567

HealthGPT is a cutting-edge Medical Large Vision-Language Model (Med-LVLM) designed to tackle a long-standing challenge in AI for healthcare: the…

12/17/2025Medical Image Generation, Medical Vision-language Modeling, Visual Question Answering

Posts pagination

Previous 1 2
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex