rag-browser

Created By

aasharia year ago

A Browser Automation Tool for Humans and AI - Built with Playwright, optimized for Bun runtime, supporting CLI and MCP Server modes for webpage analysis and automation

# cli

# typescript

Overview Content Tools Comments

Content

rag-browser

A Browser Automation Tool for Humans and AI

rag-browser is a versatile tool built with Playwright that enables webpage analysis and automation. It operates in two modes: a CLI mode for direct webpage analysis and an MCP Server mode for integration with AI systems via the Model Context Protocol (MCP). Whether you're a developer exploring a webpage's structure or an AI system executing complex browser tasks, rag-browser provides a robust and flexible solution.

Version: 2.4.0
License: Open-source (MIT, see LICENSE)
Repository: github.com/aashari/rag-browser
Author: Andi Ashari

Features

CLI Mode: Analyze webpages, extract interactive elements (inputs, buttons, links), and execute custom action plans.
MCP Server Mode: Run as a server for AI systems to perform browser automation tasks programmatically.
Action Support: Wait, click, type, press keys, and capture content (HTML or Markdown).
Stability: Ensures reliable execution with built-in page stability checks (network, layout, mutations).
Output Options: Pretty-printed console output or JSON for machine-readable results.
Runtime: Optimized for Bun (recommended), with fallback support for Node.js/npm.

Installation

Prerequisites

Bun (recommended): curl -fsSL https://bun.sh/install | bash
Node.js (optional): Version 16+ with npm
No local installation required—use bunx or npx to run directly from GitHub.

Running the Tool

Use bunx (preferred) or npx to execute rag-browser without cloning the repository:

# Using Bun (Recommended)
bunx github:aashari/rag-browser --url "https://example.com"

# Using Node.js/npm
npx -y github:aashari/rag-browser --url "https://example.com"

Installing Locally

If you prefer to install the tool locally without publishing to npm, you have two simple options:

Option 1: Install Directly from GitHub

This is the easiest way to install the tool globally on your machine:

# Using Bun (Recommended)
bun install -g github:aashari/rag-browser

# Using npm
npm install -g github:aashari/rag-browser

After installation, you can run it directly:

rag-browser --url "https://example.com"

Option 2: Clone and Install Locally

For development or customization:

# Clone the repository
git clone https://github.com/aashari/rag-browser.git
cd rag-browser

# Install dependencies
bun install  # or npm install

# Build the project
npm run build

# Link the package globally
bun link     # or npm link

After linking, you can run it directly:

rag-browser --url "https://example.com"

To contribute or modify, clone the repository:

git clone https://github.com/aashari/rag-browser.git
cd rag-browser
bun install
bun run src/index.ts

Usage

CLI Mode

Analyze a webpage or execute a sequence of actions.

Simple Page Analysis

# Using Bun
bunx github:aashari/rag-browser --url "https://example.com"

# Using Node.js/npm
npx -y github:aashari/rag-browser --url "https://example.com"

Output: Displays page title, description, and top 5 inputs, buttons, and links.

Headless Mode with JSON Output

bunx github:aashari/rag-browser --url "https://example.com" --headless --json

Output: JSON object with full page analysis.

Show All Interactive Elements

bunx github:aashari/rag-browser --url "https://example.com"

Output: Lists top 5 inputs, buttons, and links with selectors.

Execute an Action Plan

Search Wikipedia and capture results:

bunx github:aashari/rag-browser --url "https://wikipedia.org" --plan '{
  "actions": [
    {"type": "wait", "elements": ["#searchInput"]},
    {"type": "typing", "element": "#searchInput", "value": "AI Tools"},
    {"type": "keyPress", "key": "Enter"},
    {"type": "wait", "elements": [".mw-search-results-container"]},
    {"type": "print", "elements": [".mw-search-result"], "format": "markdown"}
  ]
}'

Output: Executes the plan and prints search results in Markdown.

CLI Options

Option	Description	Example Value
`--url`	Target URL (required)	`"https://example.com"`
`--headless`	Run without UI	(flag)
`--json`	Output in JSON format	(flag)
`--simple-selectors`	Use simpler CSS selectors	(flag)
`--plan`	JSON string of actions	See above example
`--timeout`	Timeout in ms (-1 for infinite)	`5000`
`--debug`	Enable debug logging	`false`

MCP Server Mode

Run as a server for AI integration.

Start the Server

# Using Bun
bunx github:aashari/rag-browser

# Using Node.js/npm
npx -y github:aashari/rag-browser

AI Configuration

Add to your AI system's MCP configuration:

// For Bun
{
  "mcpServers": {
    "rag-browser": {
      "command": "bunx",
      "args": ["github:aashari/rag-browser"]
    }
  }
}

// For Node.js/npm
{
  "mcpServers": {
    "rag-browser": {
      "command": "npx",
      "args": ["-y", "github:aashari/rag-browser"]
    }
  }
}

Supported Actions

Action	Description	Required Fields	Optional Fields
`wait`	Wait for elements	`elements: string[]`	`timeout: number`
`click`	Click an element	`element: string`	-
`typing`	Type text	`element: string`, `value: string`	`delay: number`
`keyPress`	Press a key	`key: string`	`element: string`
`print`	Capture content	`elements: string[]`	`format: "html" \| "markdown"`

ForHumans

Why Use rag-browser?

Explore Webpages: Quickly analyze a page's structure and interactive elements.
Automate Tasks: Define and execute browser actions without coding.
Debugging: Use detailed output to understand page behavior.

Example Workflow

Analyze a login page:

bunx github:aashari/rag-browser --url "https://example.com/login"

Create a plan to log in:

{
  "actions": [
    {"type": "typing", "element": "input[name='username']", "value": "user"},
    {"type": "typing", "element": "input[name='password']", "value": "pass"},
    {"type": "click", "element": "button[type='submit']"}
  ]
}

Execute:

bunx github:aashari/rag-browser --url "https://example.com/login" --plan '<your_json_here>'

ForAI

Integration with AI Systems

rag-browser exposes browser automation via MCP, allowing AI to:

Navigate webpages
Extract content
Perform actions

Example AI Request

{
  "tool": "rag-browser",
  "action": {
    "url": "https://wikipedia.org",
    "debug": false,
    "plan": {
      "actions": [
        {"type": "wait", "elements": ["#searchInput"]},
        {"type": "typing", "element": "#searchInput", "value": "Machine Learning"},
        {"type": "keyPress", "key": "Enter"},
        {"type": "print", "elements": [".mw-search-result"], "format": "markdown"}
      ]
    }
  }
}

Response: Markdown content of search results.

Capabilities

Dynamic Interaction: Responds to page changes (e.g., navigation).
Content Extraction: Returns structured data (HTML/Markdown).
Error Handling: Provides detailed feedback on failures.

Development

Project Structure

src/
├── cli/         # CLI entry point
├── config/      # Constants and versioning
├── core/        # Browser automation logic
├── mcp/         # MCP server implementation
├── types/       # TypeScript types
├── utils/       # Helper functions
└── index.ts     # Main entry point
tests/
├── mcp-server.test.ts  # Core MCP server tests
├── simple-mcp.test.ts  # Simple MCP tests
├── wikipedia-search.test.ts  # Wikipedia search tests
├── resource.test.ts  # Resource management tests
├── test-utils.ts  # Common test utilities
└── README.md         # Testing documentation

Build and Run Locally

bun install
bun run src/index.ts --url "https://example.com"

Testing

The project includes a comprehensive test suite for the MCP server. The tests use Bun's built-in test runner and the MCP SDK client to test the server's functionality.

Running Tests

# Run all tests
bun test

# Run specific test suites
bun test tests/mcp-server.test.ts
bun test tests/simple-mcp.test.ts
bun test tests/wikipedia-search.test.ts
bun test tests/resource.test.ts

# Or use the npm scripts
npm test
npm run test:mcp
npm run test:simple
npm run test:wikipedia
npm run test:resource

# Run tests with coverage
npm run test:coverage

The test suite includes:

Core MCP Server Tests: Tests the basic functionality of the MCP server
Simple MCP Tests: Tests basic operations like listing tools and analyzing a webpage
Wikipedia Search Tests: Tests executing complex action plans like searching Wikipedia
Resource Management Tests: Tests resource listing and reading capabilities

For more information about testing, see the tests/README.md file.

Advanced Usage

Interactive Sessions with User Authentication

For scenarios where you need to manually authenticate or interact with a page before capturing content, the tool will automatically keep the browser open when it detects user interactions:

bun run src/index.ts --url "https://example.com" --plan '{ "actions": [ {"type": "wait", "elements": [".authenticated-content"], "timeout": -1} ] }'

This will:

Open the browser to the specified URL
Wait indefinitely for the specified elements
Automatically keep the browser open if you interact with it
Automatically close the browser after 1 minute of inactivity

This is particularly useful for:

Manual authentication flows
CAPTCHAs that require human interaction
Complex interactions that are difficult to automate

The browser will remain open as long as you continue to interact with it (mouse movements, clicks, typing, scrolling). After 1 minute of inactivity, it will automatically close.

Recommend Servers

TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.

TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.

DeepChatYour AI Partner on Desktop

Baidu Map百度地图核心API现已全面兼容MCP协议，是国内首家兼容MCP协议的地图服务商。

Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.

EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.

WindsurfThe new purpose-built IDE to harness magic

Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.

CursorThe AI Code Editor

BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.

Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.