Sponsored by Deepsite.site

Local Voice Mcp

Created By
CodeCraftersLLC7 months ago
Give your MCP clients the ability to speak by running local voice models
Content

Local Voice MCP

Give your MCP clients the ability to speak by running local voice models using Chatterbox TTS.

Quickstart

The package includes a high-quality female reference voice that's used by default. All environment variables are optional.

{
  "mcpServers": {
    "local-voice-mcp": {
      "command": "npx",
      "args": ["-y", "@codecraftersllc/local-voice-mcp"],
      "env": {
        "USE_MALE_VOICE": "false",
        "CHATTERBOX_EXAGGERATION": "0.5",
        "CHATTERBOX_CFG_WEIGHT": "1.2",
        "CHATTERBOX_MAX_CHARACTERS": "2000",
        "CHATTERBOX_PLAYBACK_VOLUME": "75"
      }
    }
  }
}

Features

  • MCP Server Implementation: Full Model Context Protocol server using @modelcontextprotocol/sdk
  • HTTP API: ElevenLabs-compatible REST API for direct integration
  • Text-to-Speech Synthesis: High-quality voice synthesis using Chatterbox TTS
  • Voice Cloning: Support for reference audio for voice cloning
  • Prosody Controls: Adjustable exaggeration and configuration weights
  • Volume Control: Configurable audio playback volume with cross-platform support
  • Robust File Management: Automatic cleanup of temporary audio files
  • Security: Path validation and sanitization to prevent directory traversal
  • Dual Mode Operation: Run as MCP server or HTTP server

Installation

npm install -g local-voice-mcp

From Source

git clone <repository-url>
cd local-voice-mcp
npm install
npm run build

Usage

MCP Server Mode (Default)

Run as an MCP server with stdio transport:

local-voice-mcp-server

Or using npx:

npx local-voice-mcp-server

HTTP Server Mode

Run as an HTTP server:

MCP_MODE=http local-voice-mcp-server

Or set the port:

PORT=3000 MCP_MODE=http local-voice-mcp-server

Development

# Run MCP server in development
npm run dev:mcp

# Run HTTP server in development
npm run dev:http

# Run tests
npm test

# Build project
npm run build

MCP Tools

When running in MCP mode, the following tools are available:

synthesize_text

Converts text to speech and returns audio data.

Parameters:

  • text (string, required): Text to synthesize
  • referenceAudio (string, optional): Path to reference audio for voice cloning
  • exaggeration (number, optional): Voice style exaggeration (0-2, default: 0.2)
  • cfg_weight (number, optional): Configuration weight (0-5, default: 1.0)

Returns:

  • JSON response with synthesis status and file path

Example Response:

{
  "success": true,
  "message": "Speech synthesis completed successfully",
  "audioFile": "/tmp/local-voice-mcp/audio_20240115_103000_abc123.wav",
  "textLength": 25,
  "audioFormat": "wav",
  "options": {
    "exaggeration": 0.2,
    "cfg_weight": 1.0
  },
  "generatedAt": "2024-01-15T10:30:00.000Z"
}

The audio file is saved to the temporary directory and can be played using any audio player or accessed programmatically.

play_audio

Play an audio file using the system's default audio player with optional volume control.

Parameters:

  • audioFile (string, required): Path to the audio file to play
  • volume (number, optional): Playback volume as percentage (0-100). If not specified, uses CHATTERBOX_PLAYBACK_VOLUME environment variable or default of 50.

Supported Formats:

  • WAV files (.wav)
  • MP3 files (.mp3)

Returns:

  • JSON response with playback status and system information

Example Response:

{
  "success": true,
  "message": "Successfully played audio file: /tmp/local-voice-mcp/audio_123.wav",
  "audioFile": "/tmp/local-voice-mcp/audio_123.wav",
  "volume": 50,
  "platform": "darwin",
  "command": "afplay -v 0.5 /tmp/local-voice-mcp/audio_123.wav",
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Platform Support:

  • Cross-platform: Prefers ffplay (from ffmpeg) for consistent volume control across all platforms
  • macOS: Falls back to afplay command with -v volume flag
  • Windows: Falls back to PowerShell with MediaPlayer and volume control
  • Linux: Falls back to mpg123 (MP3) with gain control or aplay (WAV, no volume control)

tts_status

Returns the current status of the TTS service.

Parameters: None

Returns:

  • JSON response with service status and capabilities

Example Response:

{
  "success": true,
  "status": "operational",
  "message": "TTS service is ready and operational",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "service": {
    "name": "Chatterbox TTS",
    "version": "0.1.0",
    "capabilities": [
      "text-to-speech synthesis",
      "voice cloning with reference audio",
      "prosody controls"
    ]
  }
}

MCP Resources

service-info

Provides information about the Local Voice MCP service.

URI: local-voice://service-info

HTTP API

When running in HTTP mode, the server exposes:

POST /tts

ElevenLabs-compatible text-to-speech endpoint.

Headers:

  • X-API-Key: API key (placeholder for authentication)
  • Content-Type: application/json

Request Body:

{
  "text": "Hello, world!",
  "options": {
    "referenceAudio": "path/to/reference.wav",
    "exaggeration": 0.5,
    "cfg_weight": 1.2
  }
}

Response:

  • Content-Type: audio/wav
  • Binary audio data

Configuration

Environment Variables

Server Configuration

  • PORT: HTTP server port (default: 59125)
  • MCP_MODE: Operation mode - "mcp" or "http" (default: "mcp")

TTS Configuration

These environment variables can be used to set default values for TTS synthesis. They will be used if not overridden by options passed to the synthesize method:

  • CHATTERBOX_REFERENCE_AUDIO: Path to reference audio file for voice cloning (can be anywhere on your system, supports .wav, .mp3, .flac, .ogg, .m4a, .aac). If not specified, uses the bundled high-quality female reference voice.
  • USE_MALE_VOICE: Use male voice instead of bundled female reference voice (true/false, default: false). When set to true, uses the default Chatterbox male voice instead of the bundled female voice. This only applies when no custom reference audio is specified.
  • CHATTERBOX_EXAGGERATION: Voice style exaggeration level (float, default: 0.2)
  • CHATTERBOX_CFG_WEIGHT: Configuration weight for TTS model (float, default: 1.0)
  • CHATTERBOX_MAX_CHARACTERS: Maximum number of characters allowed for text input (integer, default: 2000)
  • CHATTERBOX_OUTPUT_DIR: Output directory for generated audio files (default: system temp + "local-voice-mcp")
  • CHATTERBOX_PLAYBACK_VOLUME: Default audio playback volume as percentage (integer, 0-100, default: 50)

Example:

# Set default TTS parameters via environment variables
# Reference audio can be anywhere on your system
export CHATTERBOX_REFERENCE_AUDIO="/Users/john/Music/my-voice.wav"
export CHATTERBOX_EXAGGERATION="0.5"
export CHATTERBOX_CFG_WEIGHT="1.2"
export CHATTERBOX_MAX_CHARACTERS="3000"
export CHATTERBOX_PLAYBACK_VOLUME="75"

# Run the MCP server with these defaults
local-voice-mcp-server

Using with npx:

{
  "mcpServers": {
    "local-voice-mcp": {
      "command": "npx",
      "args": ["-y", "@codecraftersllc/local-voice-mcp"],
      "env": {
        "CHATTERBOX_REFERENCE_AUDIO": "/Users/john/Music/my-voice.wav",
        "CHATTERBOX_EXAGGERATION": "0.5",
        "CHATTERBOX_CFG_WEIGHT": "1.2",
        "CHATTERBOX_MAX_CHARACTERS": "3000",
        "CHATTERBOX_PLAYBACK_VOLUME": "75"
      }
    }
  }
}

Using male voice instead of bundled female voice:

{
  "mcpServers": {
    "local-voice-mcp": {
      "command": "npx",
      "args": ["-y", "@codecraftersllc/local-voice-mcp"],
      "env": {
        "USE_MALE_VOICE": "true",
        "CHATTERBOX_EXAGGERATION": "0.3",
        "CHATTERBOX_CFG_WEIGHT": "1.0"
      }
    }
  }
}

Priority Order:

  1. Options passed to the synthesize_text or play_audio tools (highest priority)
  2. Environment variables
  3. Built-in defaults (lowest priority)

MCP Client Configuration

Add to your MCP client configuration:

{
  "local-voice-mcp": {
    "command": "npx",
    "args": ["-y", "local-voice-mcp-server"],
    "env": {}
  }
}

Testing with Cursor

Cursor is a popular AI-powered code editor that supports MCP. Here's how to test the Local Voice MCP server with Cursor:

1. Install the Package

First, install the package globally or ensure it's available:

npm install -g local-voice-mcp
# or
npm install local-voice-mcp

2. Configure Cursor

Add the MCP server to your Cursor configuration file. The location depends on your operating system:

  • macOS: ~/Library/Application Support/Cursor/User/globalStorage/cursor.mcp/config.json
  • Windows: %APPDATA%\Cursor\User\globalStorage\cursor.mcp\config.json
  • Linux: ~/.config/Cursor/User/globalStorage/cursor.mcp/config.json

Add this configuration:

{
  "mcpServers": {
    "local-voice-mcp": {
      "command": "local-voice-mcp-server",
      "args": [],
      "env": {}
    }
  }
}

Or if using npx:

{
  "mcpServers": {
    "local-voice-mcp": {
      "command": "npx",
      "args": ["-y", "local-voice-mcp-server"],
      "env": {}
    }
  }
}

3. Restart Cursor

After adding the configuration, restart Cursor to load the MCP server.

4. Test the Integration

Once Cursor is restarted, you can test the TTS functionality:

  1. Open Cursor's AI chat

  2. Ask Cursor to use the TTS tools:

    Can you synthesize speech for "Hello, this is a test of the local voice MCP server"?
    
  3. Check TTS status:

    What's the status of the TTS service?
    
  4. Test with options:

    Synthesize "Welcome to the future of AI coding" with exaggeration set to 0.5
    
  5. Test audio playback:

    Play the audio file that was just generated
    
  6. Test volume control:

    Play the audio file at 25% volume
    

5. Verify the Tools Are Available

You should see the following tools available in Cursor:

  • synthesize_text - For text-to-speech conversion
  • play_audio - For playing audio files through system audio
  • tts_status - For checking service status

6. Troubleshooting

If the MCP server doesn't appear in Cursor:

  1. Check the logs: Look for error messages in Cursor's developer console
  2. Verify installation: Run local-voice-mcp-server directly in terminal to ensure it works
  3. Check paths: Ensure the command path is correct in your configuration
  4. Restart Cursor: Sometimes a full restart is needed after configuration changes
  5. JSON parsing errors: If you see "Unexpected token" errors, ensure you're using the latest version with proper stdio logging

7. Expected Behavior

When working correctly:

  • Cursor will be able to call the TTS tools
  • You'll receive structured JSON responses with file paths
  • Audio files will be saved to the temporary directory
  • The TTS service will use the Chatterbox TTS engine
  • Files can be played using system audio players

All responses are in structured JSON format with clear file paths, making it easy for MCP clients and AI agents to understand and work with the results.

Requirements

  • Node.js 16+
  • Python 3.8+
  • PyTorch
  • Chatterbox TTS

The service automatically sets up the Python environment and installs required dependencies on first run.

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Client    │    │  HTTP Client     │    │   CLI Tool      │
│ (Cursor, etc.)  │    │                  │    │                 │
└─────────┬───────┘    └─────────┬────────┘    └─────────┬───────┘
          │                      │                       │
          │ stdio                │ HTTP                  │ stdio
          │                      │                       │
          ▼                      ▼                       ▼
    ┌─────────────────────────────────────────────────────────────┐
    │              Local Voice MCP Server                         │
    │  ┌─────────────────┐    ┌─────────────────────────────────┐ │
    │  │   MCP Server    │    │         HTTP Server             │ │
    │  │   (stdio)       │    │      (Express.js)               │ │
    │  └─────────────────┘    └─────────────────────────────────┘ │
    │                                   │                         │
    │  ┌─────────────────────────────────────────────────────────┐ │
    │  │              TTS Tools & Services                       │ │
    │  │  ┌─────────────────┐    ┌─────────────────────────────┐ │ │
    │  │  │ ChatterboxService│    │    File Management         │ │ │
    │  │  │                 │    │   (Cleanup & Security)     │ │ │
    │  │  └─────────────────┘    └─────────────────────────────┘ │ │
    │  └─────────────────────────────────────────────────────────┘ │
    └─────────────────────────────────────────────────────────────┘
                        ┌─────────────────────┐
                        │   Python TTS        │
                        │  (Chatterbox)       │
                        └─────────────────────┘

License

MIT

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

Server Config

{
  "mcpServers": {
    "local-voice-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@codecraftersllc/local-voice-mcp"
      ],
      "env": {
        "USE_MALE_VOICE": "false",
        "CHATTERBOX_EXAGGERATION": "0.5",
        "CHATTERBOX_CFG_WEIGHT": "1.2",
        "CHATTERBOX_MAX_CHARACTERS": "2000",
        "CHATTERBOX_PLAYBACK_VOLUME": "100"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Amap Maps高德地图官方 MCP Server
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Playwright McpPlaywright MCP server
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Tavily Mcp
CursorThe AI Code Editor
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
WindsurfThe new purpose-built IDE to harness magic
ChatWiseThe second fastest AI chatbot™
Serper MCP ServerA Serper MCP Server
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
DeepChatYour AI Partner on Desktop
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.