Sponsored by Deepsite.site

AWorld: Advancing Agentic AI

Created By
inclusionAI10 months ago
Build, evaluate and run General Multi-Agent Assistance with ease
Content

AWorld: Advancing Agentic AI

Twitter Follow WeChat QR Code Discord License: MIT

News

  • 🐳 [2024/05/22] For quick GAIA evaluation, MCP tools, AWorld, and models are now available in a single Docker image. See ./README-docker.md for instructions and youtube video for demo.
  • 🥳 [2024/05/13] AWorld has updated its state management for browser use and enhanced the video processing MCP server, achieving a score of 77.58 on GAIA validation (Pass@1 = 61.8) and maintaining its position as the top-ranked open-source framework. Learn more: GAIA leaderboard
  • ✨ [2025/04/23] AWorld ranks 3rd on GAIA benchmark (69.7 avg) with impressive Pass@1 = 58.8, 1st among open-source frameworks. Reproduce with python examples/gaia/run.py

Introduction

AWorld (short for Agent World) is an advanced framework where multiple AI agents collaborate to accomplish complex goals, such as those found in the GAIA benchmark. Its core features include:

  • Collaboration: Enables event-driven communication on two hierarchical levels—between agents, and between models and environments (e.g., MCP servers).
  • Autonomy: Features robust runtime state management for handling multi-step, intricate tasks.
  • Evolution: Supports a highly concurrent execution environment, empowering agents to learn and adapt across diverse tasks and environments.

Unlock the power of intelligent teamwork and continuous improvement with AWorld!

What we offer:

For quick evaluation, training, rapid prototyping, and other use cases, we provide Docker images that package the MCP tools, AWorld framework, and models together. This enables users to effortlessly utilize AWorld’s communication protocols and state management features right out of the box. The available Docker images are listed below:

ScenarioDockerDemo
EvalutionGAIA Evaluation Docker Image. For instructions on building the image, see ./README-docker.md. AWorld Browser Demo on YouTube

▶️ Runing GAIA task in our image

Training GAIA Training Docker Images, supporting distributed and high-concurrency deployments. Instructions for training are coming soon.

Want to build your own multi-agent system? Check out the detailed tutorials below to get started! ⬇️⬇️⬇️

Installation

With Python>=3.11:

pip install aworld

Usage

Quick Start

from aworld.config.conf import AgentConfig
from aworld.core.agent.base import Agent
from aworld.runner import Runners

if __name__ == '__main__':
    agent_config = AgentConfig(
        llm_provider="openai",
        llm_model_name="gpt-4o",

        # Set via environment variable or direct configuration
        # llm_api_key="YOUR_API_KEY", 
        # llm_base_url="https://api.openai.com/v1"
    )

    search = Agent(
        conf=agent_config,
        name="search_agent",
        system_prompt="You are a helpful agent.",
        mcp_servers=["amap-amap-sse"] # MCP server name for agent to use
    )

    # Run agent
    Runners.sync_run(input="Hotels within 1 kilometer of West Lake in Hangzhou",
                     agent=search)

Here is a MCP server config example.

Running Pre-defined Agents (demo code)

Below are demonstration videos showcasing AWorld's capabilities across different agent configurations and environments.

ModeTypeDemo
Single AgentBrowser use AWorld Browser Demo on YouTube

▶️ Watch Browser Demo on YouTube

Phone use AWorld Mobile Demo on YouTube

▶️ Watch Mobile Demo on YouTube

Multi AgentCooperative Teams AWorld Travel Demo on YouTube

▶️ Watch Travel Demo on YouTube

Competitive Teams AWorld Debate Demo on YouTube

▶️ Watch Debate Arena on YouTube

Mixed of both TeamsComing Soon 🚀

or Creating Your Own Agents (Quick Start Tutorial)

Here is a multi-agent example of running a level2 task from the GAIA benchmark:

from examples.plan_execute.agent import PlanAgent, ExecuteAgent
from examples.tools.common import Agents, Tools
from aworld.core.agent.swarm import Swarm
from aworld.core.task import Task
from aworld.config.conf import AgentConfig, TaskConfig
from aworld.dataset.mock import mock_dataset
from aworld.runner import Runners

import os

# Need OPENAI_API_KEY
os.environ['OPENAI_API_KEY'] = "your key"
# Optional endpoint settings, default `https://api.openai.com/v1`
# os.environ['OPENAI_ENDPOINT'] = "https://api.openai.com/v1"

# One sample for example
test_sample = mock_dataset("gaia")

# Create agents
plan_config = AgentConfig(
    name=Agents.PLAN.value,
    llm_provider="openai",
    llm_model_name="gpt-4o",
)
agent1 = PlanAgent(conf=plan_config)

exec_config = AgentConfig(
    name=Agents.EXECUTE.value,
    llm_provider="openai",
    llm_model_name="gpt-4o",
)
agent2 = ExecuteAgent(conf=exec_config, tool_names=[Tools.DOCUMENT_ANALYSIS.value])

# Create swarm for multi-agents
# define (head_node, tail_node) edge in the topology graph
# NOTE: the correct order is necessary
swarm = Swarm((agent1, agent2), sequence=False)

# Define a task
task = Task(input=test_sample, swarm=swarm, conf=TaskConfig())

# Run task
result = Runners.sync_run_task(task=task)

print(f"Time cost: {result['time_cost']}")
print(f"Task Answer: {result['task_0']['answer']}")
Time cost: 26.431413888931274
Task Answer: Time-Parking 2: Parallel Universe

Framework Architecture

AWorld uses a client-server architecture with three main components:

  1. Client-Server Architecture: Similar to ray, this architecture:

    • Decouples agents and environments for better scalability and flexibility
    • Provides a unified interaction protocol for all agent-environment interactions
  2. Agent/Actor:

    • Encapsulates system prompts, tools, mcp servers, and models with the capability to hand off execution to other agents
    FieldTypeDescription
    idstringUnique identifier for the agent
    namestringName of the agent
    model_namestringLLM model name of the agent
    _llmobjectLLM model instance based on model_name (e.g., "gpt-4", "claude-3")
    confBaseModelConfiguration inheriting from pydantic BaseModel
    trajectoryobjectMemory for maintaining context across interactions
    tool_nameslistList of tools the agent can use
    mcp_serverslistList of mcp servers the agent can use
    handoffslistAgent as tool; list of other agents the agent can delegate tasks to
    finishedboolFlag indicating whether the agent has completed its task
  3. Environment/World Model: Various tools and models in the environment

    • MCP servers
    • Computer interfaces (browser, shell, functions, etc.)
    • World Model
    ToolsDescription
    mcp ServersAWorld seamlessly integrates a rich collection of MCP servers as agent tools
    browserControls web browsers for navigation, form filling, and interaction with web pages
    androidManages Android device simulation for mobile app testing and automation
    shellExecutes shell commands for file operations and system interactions
    codeRuns code snippets in various languages for data processing and automation
    searchPerforms web searches and returns structured results for information gathering and summary
    documentHandles file operations including reading, writing, and managing directories

Dual Purpose Framework

AWorld serves two complementary purposes:

Agent Evaluation

  • Unified task definitions to run both customized and public benchmarks
  • Efficient and stable execution environment
  • Detailed test reports measuring efficiency (steps to completion), completion rates, token costs, ect.

Agent Training

  • Agent models improve to overcome challenges from env
  • World models (environments) evolve to present new, more complex scenarios

🔧 Key Features

  • MCP Servers as Tools - Powerful integration of MCP servers providing robust tooling capabilities

  • 🌐 Environment Multi-Tool Support:

    • Default computer-use tools; (browser, shell, code, APIs, file system, etc.)
    • Android device simulation
    • Cloud sandbox for quick and stable deployment
    • Reward model as env simulation
  • 🤖 AI-Powered Agents:

    • Agent initialization
    • Delegation between multiple agents
    • Asynchronous delegation
    • Human delegation (e.g., for password entry)
    • Pre-deployed open source LLMs powered by state-of-the-art inference frameworks
  • 🎛️ Web Interface:

    • UI for execution visualization
    • Server configuration dashboard
    • Real-time monitoring tools
    • Performance reporting
  • 🧠 Benchmarks and Samples:

    • Support standardized benchmarks by default, e.g., GAIA, WebArena
    • Support customized benchmarks
    • Support generating training samples

Contributing

We warmly welcome developers to join us in building and improving AWorld! Whether you're interested in enhancing the framework, fixing bugs, or adding new features, your contributions are valuable to us.

For academic citations or wish to contact us, please use the following BibTeX entry:

@software{aworld2025,
  author = {Agent Team at Ant Group},
  title = {AWorld: A Unified Agent Playground for Computer and Phone Use Tasks},
  year = {2025},
  url = {https://github.com/inclusionAI/AWorld},
  version = {0.1.0},
  publisher = {GitHub},
  email = {chenyi.zcy at antgroup.com}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Tavily Mcp
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Amap Maps高德地图官方 MCP Server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
WindsurfThe new purpose-built IDE to harness magic
ChatWiseThe second fastest AI chatbot™
CursorThe AI Code Editor
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
DeepChatYour AI Partner on Desktop
Playwright McpPlaywright MCP server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Serper MCP ServerA Serper MCP Server
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code