Sponsored by Deepsite.site

rag-browser

Created By
aashari9 months ago
A Browser Automation Tool for Humans and AI - Built with Playwright, optimized for Bun runtime, supporting CLI and MCP Server modes for webpage analysis and automation
Content

rag-browser

A Browser Automation Tool for Humans and AI

rag-browser is a versatile tool built with Playwright that enables webpage analysis and automation. It operates in two modes: a CLI mode for direct webpage analysis and an MCP Server mode for integration with AI systems via the Model Context Protocol (MCP). Whether you're a developer exploring a webpage's structure or an AI system executing complex browser tasks, rag-browser provides a robust and flexible solution.


Features

  • CLI Mode: Analyze webpages, extract interactive elements (inputs, buttons, links), and execute custom action plans.
  • MCP Server Mode: Run as a server for AI systems to perform browser automation tasks programmatically.
  • Action Support: Wait, click, type, press keys, and capture content (HTML or Markdown).
  • Stability: Ensures reliable execution with built-in page stability checks (network, layout, mutations).
  • Output Options: Pretty-printed console output or JSON for machine-readable results.
  • Runtime: Optimized for Bun (recommended), with fallback support for Node.js/npm.

Installation

Prerequisites

  • Bun (recommended): curl -fsSL https://bun.sh/install | bash
  • Node.js (optional): Version 16+ with npm
  • No local installation required—use bunx or npx to run directly from GitHub.

Running the Tool

Use bunx (preferred) or npx to execute rag-browser without cloning the repository:

# Using Bun (Recommended)
bunx github:aashari/rag-browser --url "https://example.com"

# Using Node.js/npm
npx -y github:aashari/rag-browser --url "https://example.com"

Installing Locally

If you prefer to install the tool locally without publishing to npm, you have two simple options:

Option 1: Install Directly from GitHub

This is the easiest way to install the tool globally on your machine:

# Using Bun (Recommended)
bun install -g github:aashari/rag-browser

# Using npm
npm install -g github:aashari/rag-browser

After installation, you can run it directly:

rag-browser --url "https://example.com"

Option 2: Clone and Install Locally

For development or customization:

# Clone the repository
git clone https://github.com/aashari/rag-browser.git
cd rag-browser

# Install dependencies
bun install  # or npm install

# Build the project
npm run build

# Link the package globally
bun link     # or npm link

After linking, you can run it directly:

rag-browser --url "https://example.com"

To contribute or modify, clone the repository:

git clone https://github.com/aashari/rag-browser.git
cd rag-browser
bun install
bun run src/index.ts

Usage

CLI Mode

Analyze a webpage or execute a sequence of actions.

Simple Page Analysis

# Using Bun
bunx github:aashari/rag-browser --url "https://example.com"

# Using Node.js/npm
npx -y github:aashari/rag-browser --url "https://example.com"

Output: Displays page title, description, and top 5 inputs, buttons, and links.

Headless Mode with JSON Output

bunx github:aashari/rag-browser --url "https://example.com" --headless --json

Output: JSON object with full page analysis.

Show All Interactive Elements

bunx github:aashari/rag-browser --url "https://example.com"

Output: Lists top 5 inputs, buttons, and links with selectors.

Execute an Action Plan

Search Wikipedia and capture results:

bunx github:aashari/rag-browser --url "https://wikipedia.org" --plan '{
  "actions": [
    {"type": "wait", "elements": ["#searchInput"]},
    {"type": "typing", "element": "#searchInput", "value": "AI Tools"},
    {"type": "keyPress", "key": "Enter"},
    {"type": "wait", "elements": [".mw-search-results-container"]},
    {"type": "print", "elements": [".mw-search-result"], "format": "markdown"}
  ]
}'

Output: Executes the plan and prints search results in Markdown.

CLI Options

OptionDescriptionExample Value
--urlTarget URL (required)"https://example.com"
--headlessRun without UI(flag)
--jsonOutput in JSON format(flag)
--simple-selectorsUse simpler CSS selectors(flag)
--planJSON string of actionsSee above example
--timeoutTimeout in ms (-1 for infinite)5000
--debugEnable debug loggingfalse

MCP Server Mode

Run as a server for AI integration.

Start the Server

# Using Bun
bunx github:aashari/rag-browser

# Using Node.js/npm
npx -y github:aashari/rag-browser

AI Configuration

Add to your AI system's MCP configuration:

// For Bun
{
  "mcpServers": {
    "rag-browser": {
      "command": "bunx",
      "args": ["github:aashari/rag-browser"]
    }
  }
}

// For Node.js/npm
{
  "mcpServers": {
    "rag-browser": {
      "command": "npx",
      "args": ["-y", "github:aashari/rag-browser"]
    }
  }
}

Supported Actions

ActionDescriptionRequired FieldsOptional Fields
waitWait for elementselements: string[]timeout: number
clickClick an elementelement: string-
typingType textelement: string, value: stringdelay: number
keyPressPress a keykey: stringelement: string
printCapture contentelements: string[]format: "html" | "markdown"

ForHumans

Why Use rag-browser?

  • Explore Webpages: Quickly analyze a page's structure and interactive elements.
  • Automate Tasks: Define and execute browser actions without coding.
  • Debugging: Use detailed output to understand page behavior.

Example Workflow

  1. Analyze a login page:
    bunx github:aashari/rag-browser --url "https://example.com/login"
    
  2. Create a plan to log in:
    {
      "actions": [
        {"type": "typing", "element": "input[name='username']", "value": "user"},
        {"type": "typing", "element": "input[name='password']", "value": "pass"},
        {"type": "click", "element": "button[type='submit']"}
      ]
    }
    
  3. Execute:
    bunx github:aashari/rag-browser --url "https://example.com/login" --plan '<your_json_here>'
    

ForAI

Integration with AI Systems

rag-browser exposes browser automation via MCP, allowing AI to:

  • Navigate webpages
  • Extract content
  • Perform actions

Example AI Request

{
  "tool": "rag-browser",
  "action": {
    "url": "https://wikipedia.org",
    "debug": false,
    "plan": {
      "actions": [
        {"type": "wait", "elements": ["#searchInput"]},
        {"type": "typing", "element": "#searchInput", "value": "Machine Learning"},
        {"type": "keyPress", "key": "Enter"},
        {"type": "print", "elements": [".mw-search-result"], "format": "markdown"}
      ]
    }
  }
}

Response: Markdown content of search results.

Capabilities

  • Dynamic Interaction: Responds to page changes (e.g., navigation).
  • Content Extraction: Returns structured data (HTML/Markdown).
  • Error Handling: Provides detailed feedback on failures.

Development

Project Structure

src/
├── cli/         # CLI entry point
├── config/      # Constants and versioning
├── core/        # Browser automation logic
├── mcp/         # MCP server implementation
├── types/       # TypeScript types
├── utils/       # Helper functions
└── index.ts     # Main entry point
tests/
├── mcp-server.test.ts  # Core MCP server tests
├── simple-mcp.test.ts  # Simple MCP tests
├── wikipedia-search.test.ts  # Wikipedia search tests
├── resource.test.ts  # Resource management tests
├── test-utils.ts  # Common test utilities
└── README.md         # Testing documentation

Build and Run Locally

bun install
bun run src/index.ts --url "https://example.com"

Testing

The project includes a comprehensive test suite for the MCP server. The tests use Bun's built-in test runner and the MCP SDK client to test the server's functionality.

Running Tests

# Run all tests
bun test

# Run specific test suites
bun test tests/mcp-server.test.ts
bun test tests/simple-mcp.test.ts
bun test tests/wikipedia-search.test.ts
bun test tests/resource.test.ts

# Or use the npm scripts
npm test
npm run test:mcp
npm run test:simple
npm run test:wikipedia
npm run test:resource

# Run tests with coverage
npm run test:coverage

The test suite includes:

  • Core MCP Server Tests: Tests the basic functionality of the MCP server
  • Simple MCP Tests: Tests basic operations like listing tools and analyzing a webpage
  • Wikipedia Search Tests: Tests executing complex action plans like searching Wikipedia
  • Resource Management Tests: Tests resource listing and reading capabilities

For more information about testing, see the tests/README.md file.

Advanced Usage

Interactive Sessions with User Authentication

For scenarios where you need to manually authenticate or interact with a page before capturing content, the tool will automatically keep the browser open when it detects user interactions:

bun run src/index.ts --url "https://example.com" --plan '{ "actions": [ {"type": "wait", "elements": [".authenticated-content"], "timeout": -1} ] }'

This will:

  1. Open the browser to the specified URL
  2. Wait indefinitely for the specified elements
  3. Automatically keep the browser open if you interact with it
  4. Automatically close the browser after 1 minute of inactivity

This is particularly useful for:

  • Manual authentication flows
  • CAPTCHAs that require human interaction
  • Complex interactions that are difficult to automate

The browser will remain open as long as you continue to interact with it (mouse movements, clicks, typing, scrolling). After 1 minute of inactivity, it will automatically close.

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
DeepChatYour AI Partner on Desktop
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
Serper MCP ServerA Serper MCP Server
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Amap Maps高德地图官方 MCP Server
CursorThe AI Code Editor
ChatWiseThe second fastest AI chatbot™
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
WindsurfThe new purpose-built IDE to harness magic
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Tavily Mcp
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Playwright McpPlaywright MCP server
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.