- Selenium Mcp
Selenium Mcp

Introduction
This server is implemented in python to bridge the gap between the AI Assistant or (custom MCP clients) and Selenium Webdrivers. It exposes selenium webdriver functionalities as MCP tools allowing AI assistanct/MCP clients to user them to perform task for web automation, web testing or web scraping.
Release Notes
Version 2.0.1 - Release 11 April 2026
In this version, minor enhancements are made in error code, return types and readme file
- Retun type changes from Union[None, dict], int, bool to str in
- cookies.py
- find.py
- get.py
- input.py
Version 2.0.0 - Release 4 April 2026
In this version, we have done some structural changes like seperating functions into save and get. Now save is just focused on saving files on the disk. And get is where LLM wants to get the data from the browser.
Following are the list of enhancements:
- Get: To get webpage as markdown, html, screenshot, element's screenshot, list of urls
- JS Executor: To execute javascript code for interacting with the webpage
- Files: To upload and download files
- Alerts: To handle alerts
- Click: Added drag and drop of elements by xpath
- File: To save webpage as pdf on disk
Version 1.0.0 - Release 31 March 2026
- Web Driver: Create new or quit exiting webdiver sessions
- Cookies: To manage cookies (add, delete, get, clear)
- Clicks: To perform clicks on elements (left client, right click, double click)
- Browser: To navigate urls and manage browser capabilities like resize, maximize, minimize, fullscreen, etc.
- Scroll: To scroll the entire webpage
- Input: Input text into elements and select/unselect checkbox, radio buttons, dropdowns options, etc.
- Find: To find element by xPaths
Key Features
- Humanised error handleing, enables LLM to intreperate errors and reconfigure tool usage accordingly
- Comprehensive element interaction: Clicks, input, select are performed by checking if element is visible, enabled, clickable, etc
- Full Navigation control: Open New url, click forward, backward, refresh, etc
The tools leverages following technologies to support
- FastMCP: For MCP server implementation
- Selenium: For web automation
- Google GenAI: For AI assistant
Upcomming
Following are the list of features that will be added in the future:
- Tools to support Chrome Dev Tools & BiDi
- Enhance save functionality to save files in different formats
- Enhanced error handling
- Multi browser support (Firefox, Edge, Safari, etc)
Example
Prompt: "Open https://rfpnotification.com and join the waiting list by entering the email address: [test_user@example.com]"
| Before | After |
|---|---|
![]() | ![]() |
After running the script, the browser took the screenshot to check if the email was entered successfully.

Test In Action

Dev Setup
Clone the repository
git clone {url}
Create virtual environment
python3 -m venv venv
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
Run the server
python server.py
The package comes with a lightweight MCP client using Google GenAI SDK to test the server. It is implemented in server.py file. To use it, you need to have a Google GenAI API key. Set it in the .env file as GEMINI_API_KEY={your_api_key}.
Run the client
python server.py
Architecture
The mcp has a very somple architecture as shown below. If this MCP is hosted locally then it communicates via stdio, if it is hosted on cloud then it communicates via http.


