Sponsored by Deepsite.site

Youtube Vision

Created By
minbang9308 months ago
Content

YouTube Vision MCP Server (youtube-vision)

NPM version License: MIT smithery badge

MCP (Model Context Protocol) server that utilizes the Google Gemini Vision API to interact with YouTube videos. It allows users to get descriptions, summaries, answers to questions, and extract key moments from YouTube videos.

Features

  • Analyzes YouTube videos using the Gemini Vision API.
  • Provides multiple tools for different interactions:
    • General description or Q&A (ask_about_youtube_video)
    • Summarization (summarize_youtube_video)
    • Key moment extraction (extract_key_moments)
  • Lists available Gemini models supporting generateContent.
  • Configurable Gemini model via environment variable.
  • Communicates via stdio (standard input/output).

Prerequisites

Before using this server, ensure you have the following:

  • Node.js: Version 18 or higher recommended. You can download it from nodejs.org.
  • Google Gemini API Key: Obtain your API key from Google AI Studio or Google Cloud Console.

Installation & Usage

There are two main ways to use this server:

Installing via Smithery

To install youtube-vision-mcp for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @minbang930/youtube-vision-mcp --client claude

The easiest way to run this server is using npx, which downloads and runs the package without needing a permanent installation.

You can configure it within your MCP client's settings file (Claude, VSCode .. ):

{
  "mcpServers": {
    "youtube-vision": {
      "command": "npx",
      "args": [
        "-y",
        "youtube-vision"
      ],
      "env": {
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
        "GEMINI_MODEL_NAME": "gemini-2.0-flash"
      }
    }
  }
}

Replace "YOUR_GEMINI_API_KEY" with your actual Google Gemini API key.

Option 2: Manual Installation (from Source)

If you want to modify the code or run it directly from the source:

  1. Clone the repository:

    git clone https://github.com/minbang930/Youtube-Vision-MCP.git
    cd youtube-vision
    
  2. Install dependencies:

    npm install
    
  3. Build the project:

    npm run build
    
  4. Configure and run: You can then run the compiled code using node dist/index.js directly (ensure GEMINI_API_KEY is set as an environment variable) or configure your MCP client to run it using the node command and the absolute path to dist/index.js, passing the API key via the env setting as shown in the npx example.

Configuration

The server uses the following environment variables:

  • GEMINI_API_KEY (Required): Your Google Gemini API key.
  • GEMINI_MODEL_NAME (Optional): The specific Gemini model to use (e.g., gemini-1.5-flash). Defaults to gemini-2.0-flash. Important: For production or commercial use, ensure you select a model version that is not marked as "Experimental" or "Preview".

Environment variables should be set in the env section of your MCP client's settings file (e.g., mcp_settings.json).

Available Tools

1. ask_about_youtube_video

Answers a question about the video or provides a general description if no question is asked.

  • Input:
    • youtube_url (string, required): The URL of the YouTube video.
    • question (string, optional): The specific question to ask about the video. If omitted, a general description is generated.
  • Output: Text containing the answer or description.
  • Example Usage (MCP Client):
    <use_mcp_tool>
      <server_name>youtube-vision</server_name>
      <tool_name>ask_about_youtube_video</tool_name>
      <arguments>
      {
        "youtube_url": "https://www.youtube.com/watch?v=VIDEO_ID",
        "question": "What is the main topic discussed around 1:30?" 
      }
      </arguments>
    </use_mcp_tool>
    
    <use_mcp_tool>
      <server_name>youtube-vision</server_name>
      <tool_name>ask_about_youtube_video</tool_name>
      <arguments>
      {
        "youtube_url": "https://www.youtube.com/watch?v=VIDEO_ID"
      }
      </arguments>
    </use_mcp_tool>
    

2. summarize_youtube_video

Generates a summary of a given YouTube video.

  • Input:
    • youtube_url (string, required): The URL of the YouTube video.
    • summary_length (string, optional): Desired summary length ('short', 'medium', 'long'). Defaults to 'medium'.
  • Output: Text containing the video summary.
  • Example Usage (MCP Client):
    <use_mcp_tool>
      <server_name>youtube-vision</server_name>
      <tool_name>summarize_youtube_video</tool_name>
      <arguments>
      {
        "youtube_url": "https://www.youtube.com/watch?v=VIDEO_ID",
        "summary_length": "short"
      }
      </arguments>
    </use_mcp_tool>
    

3. extract_key_moments

Extracts key moments (timestamps and descriptions) from a given YouTube video.

  • Input:
    • youtube_url (string, required): The URL of the YouTube video.
    • number_of_moments (integer, optional): Number of key moments to extract. Defaults to 3.
  • Output: Text describing the key moments with timestamps.
  • Example Usage (MCP Client):
    <use_mcp_tool>
      <server_name>youtube-vision</server_name>
      <tool_name>extract_key_moments</tool_name>
      <arguments>
      {
        "youtube_url": "https://www.youtube.com/watch?v=VIDEO_ID",
        "number_of_moments": 5 
      }
      </arguments>
    </use_mcp_tool>
    

4. list_supported_models

Lists available Gemini models that support the generateContent method (fetched via REST API).

  • Input: None
  • Output: Text listing the supported model names.
  • Example Usage (MCP Client):
    <use_mcp_tool>
      <server_name>youtube-vision</server_name>
      <tool_name>list_supported_models</tool_name>
      <arguments>{}</arguments>
    </use_mcp_tool>
    

Important Notes

  • Model Selection for Production: When using this server for production or commercial purposes, please ensure the selected GEMINI_MODEL_NAME is a stable version suitable for production use. According to the Gemini API Terms of Service, models marked as "Experimental" or "Preview" are not permitted for production deployment.
  • API Terms of Service: Usage of this server relies on the Google Gemini API. Users are responsible for reviewing and complying with the Google APIs Terms of Service and the Gemini API Additional Terms of Service. Note that data usage policies may differ between free and paid tiers of the Gemini API. Do not submit sensitive or confidential information when using free tiers.
  • Content Responsibility: The accuracy and appropriateness of content generated via the Gemini API are not guaranteed. Use discretion before relying on or publishing generated content.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Server Config

{
  "mcpServers": {
    "youtube-vision": {
      "command": "npx",
      "args": [
        "-y",
        "youtube-vision"
      ],
      "env": {
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
        "GEMINI_MODEL_NAME": "gemini-2.0-flash"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
DeepChatYour AI Partner on Desktop
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Amap Maps高德地图官方 MCP Server
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Playwright McpPlaywright MCP server
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Tavily Mcp
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
CursorThe AI Code Editor
ChatWiseThe second fastest AI chatbot™
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Serper MCP ServerA Serper MCP Server
WindsurfThe new purpose-built IDE to harness magic
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.