Skip to content

PaperCodex

Subscribe

Multimodal Reasoning

MobileAgent: Cross-Platform GUI Automation That Understands and Acts Like a Human

MobileAgent: Cross-Platform GUI Automation That Understands and Acts Like a Human 6632

Imagine giving a natural language instruction like “Book a round-trip flight from Beijing to Paris on Skyscanner for September 18–21”…

12/11/2025Cross-platform Agent, GUI Automation, Multimodal Reasoning
Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash

Step1X-Edit: Open-Source Image Editing That Matches GPT-4o and Gemini2 Flash 1954

Overview Step1X-Edit is a state-of-the-art open-source framework for general-purpose image editing that delivers performance comparable to leading proprietary models like…

12/11/2025Image Editing, Instruction-following Image Generation, Multimodal Reasoning
Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization

Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization 8663

Overview Imagine an AI agent that can sit at your computer, look at the screen, understand what it sees, and…

12/11/2025Computer Use Agent, GUI Automation, Multimodal Reasoning

Posts pagination

Previous 1 … 3 4
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex