Building reliable computer use agents (CUAs)—systems that can autonomously interact with graphical user interfaces (GUIs)—has long been hindered by a…
GUI Automation
CogAgent: Automate Any GUI with Vision—No Code or HTML Needed 1104
Imagine giving a natural language instruction like “Mark all unread emails as read” or “Filter Amazon search results to show…
UFO: Automate Multi-App Windows Workflows with Natural Language and Zero Human Intervention 7659
Imagine telling your computer what you want it to do—like “Summarize this PDF, email the summary to my manager, and…
ShowUI: Open-Source Vision-Language-Action Model for Human-Like GUI Automation from Screenshots 1509
In today’s digital workflows, automating interactions with graphical user interfaces (GUIs)—whether on websites, mobile apps, or desktop software—is a high-value…
MobileAgent: Cross-Platform GUI Automation That Understands and Acts Like a Human 6632
Imagine giving a natural language instruction like “Book a round-trip flight from Beijing to Paris on Skyscanner for September 18–21”…
Agent-S: Automate Any Computer Task Like a Human—With Precision, Planning, and Cross-Platform Generalization 8663
Overview Imagine an AI agent that can sit at your computer, look at the screen, understand what it sees, and…