Sponsored by Deepsite.site

UI-TARS Desktop

Created By
bytedancea year ago
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Content

IMPORTANT

[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.

UI-TARS

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

ย ย  ๐Ÿ“‘ Paper ย ย  | ๐Ÿค— Hugging Face Modelsย ย  | ย ย ๐Ÿซจ Discordย ย  | ย ย ๐Ÿค– ModelScopeย ย 
๐Ÿ–ฅ๏ธ Desktop Application ย ย  | ย ย  ๐Ÿ‘“ Midscene (use in browser) ย ย  | ย ย  Ask DeepWiki.com

Showcases

InstructionVideo
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?

News

  • [2025-04-17] - ๐ŸŽ‰ We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
  • [2025-02-20] - ๐Ÿ“ฆ Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - ๐Ÿš€ We updated the Cloud Deployment section in the ไธญๆ–‡็‰ˆ: GUIๆจกๅž‹้ƒจ็ฝฒๆ•™็จ‹ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Features

  • ๐Ÿค– Natural language control powered by Vision-Language Model
  • ๐Ÿ–ฅ๏ธ Screenshot and visual recognition support
  • ๐ŸŽฏ Precise mouse and keyboard control
  • ๐Ÿ’ป Cross-platform support (Windows/MacOS/Browser)
  • ๐Ÿ”„ Real-time feedback and status display
  • ๐Ÿ” Private and secure - fully local processing

Quick Start

See Quick Start.

Deployment

See Deployment.

Contributing

See CONTRIBUTING.md.

SDK (Experimental)

See @ui-tars/sdk

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}
Recommend Clients
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
HyperChatHyperChat is a Chat client that strives for openness, utilizing APIs from various LLMs to achieve the best Chat experience, as well as implementing productivity tools through the MCP protocol.
CursorThe AI Code Editor
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Continueโฉ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
Refact.aiOpen-source AI Agent for VS Code and JetBrains that autonomously solves coding tasks end-to-end.
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
y-cli ๐Ÿš€A Tiny Terminal Chat App for AI Models with MCP Client Support
GOOGLE-ADS---ADPLUGThe Google Ads MCP server for Claude and ChatGPT. Connect Google Ads to the AI you already use. Ask it to audit search terms, check pacing, compare CPA, and prepare changes you can review before anything touches the account.
Cline โ€“ #1 on OpenRouterAutonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
LutraLutra is the first MCP compatible client built for everyone
ChatWiseThe second fastest AI chatbotโ„ข
A Sleek AI Assistant & MCP Client5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers .
MCP PlaygroundCall MCP Server Tools Online
WindsurfThe new purpose-built IDE to harness magic
Roo Code (prev. Roo Cline)Roo Code (prev. Roo Cline) gives you a whole dev team of AI agents in your code editor.
MCP ConnectEnables cloud-based AI services to access local Stdio based MCP servers via HTTP requests
Cherry Studio๐Ÿ’ Cherry Studio is a desktop client that supports for multiple LLM providers.
DeepChatYour AI Partner on Desktop
chatmcpChatMCP is an AI chat client implementing the Model Context Protocol (MCP).
ZedCode at the speed of thought โ€“ Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.