Skip to content

PaperCodex

Subscribe

GUI Automation

ScaleCUA: Cross-Platform GUI Automation Powered by Large-Scale Open Data

ScaleCUA: Cross-Platform GUI Automation Powered by Large-Scale Open Data 616

Building reliable computer use agents (CUAs)—systems that can autonomously interact with graphical user interfaces (GUIs)—has long been hindered by a…

01/09/2026Cross-platform Agent, GUI Automation, vision-language modeling
CogAgent: Automate Any GUI with Vision—No Code or HTML Needed

CogAgent: Automate Any GUI with Vision—No Code or HTML Needed 1104

Imagine giving a natural language instruction like “Mark all unread emails as read” or “Filter Amazon search results to show…

12/18/2025GUI Automation, Vision-based Agent, Visual Language Modeling
UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention

UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention 7659

Imagine telling your computer what you want it to do—like “Summarize this PDF, email the summary to my manager, and…

12/17/2025Cross-application Task Execution, GUI Automation, Multimodal Reasoning
ShowUI: Open-Source Vision-Language-Action Model for Human-Like GUI Automation from Screenshots

ShowUI: Open-Source Vision-Language-Action Model for Human-Like GUI Automation from Screenshots 1509

In today’s digital workflows, automating interactions with graphical user interfaces (GUIs)—whether on websites, mobile apps, or desktop software—is a high-value…

12/17/2025GUI Automation, Vision-Language-Action Modeling, Zero-Shot UI Grounding
MobileAgent: Cross-Platform GUI Automation That Understands and Acts Like a Human

MobileAgent: Cross-Platform GUI Automation That Understands and Acts Like a Human 6632

Imagine giving a natural language instruction like “Book a round-trip flight from Beijing to Paris on Skyscanner for September 18–21”…

12/11/2025Cross-platform Agent, GUI Automation, Multimodal Reasoning
Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization

Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization 8663

Overview Imagine an AI agent that can sit at your computer, look at the screen, understand what it sees, and…

12/11/2025Computer Use Agent, GUI Automation, Multimodal Reasoning
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex