Sponsored by Deepsite.site

Supertone TTS MCP

Created By
supertone-inc25 days ago
MCP server for the Supertone TTS API. Generate natural speech, browse and preview the voice catalog, predict synthesis cost, and create cloned voices — directly from Claude Desktop, Cursor, or any MCP-compatible client. Supports Korean, English, Japanese, and 20+ other languages, with speed, pitch, and emotion-style control.
Overview

supertone-mcp

A composable MCP toolkit for the Supertone TTS API. Rather than a single "speak this text" command, it exposes Supertone's SDK as a set of building-block tools — synthesis, voice discovery, preview, duration/credit prediction, usage tracking, and full voice-cloning CRUD — that an LLM assembles to fulfill a request. Works in Claude Desktop, Cursor, or any MCP-compatible client.

supertone-inc/supertone-mcp MCP server

Covers Korean, English, Japanese, and 31 languages total. Speed (0.5x–2.0x), pitch shift (-24 to +24 semitones), emotion styles, per-call output mode, streaming, and model selection.

Features

Synthesis

  • text_to_speech — Convert text to audio. Per-call control of output_mode (files / resources / both), autoplay, streaming, model, plus include_phonemes / normalized_text. Long text is auto-chunked by the SDK.
  • predict_duration — Estimate audio length (and credit cost) without synthesizing.

Voice discovery (preset)

  • search_voice — Filter the catalog by language, gender, age, use_case, style, model, name, or description.
  • get_voice — Full detail for one voice.
  • preview_voice — Sample audio URLs for a voice (filterable by language/style/model).

Custom voice cloning

  • clone_voice — Create a cloned voice from a local WAV/MP3 (≤3MB).
  • search_custom_voice — List/filter cloned voices.
  • get_custom_voice — Full detail for one cloned voice.
  • edit_custom_voice — Update name and/or description.
  • delete_custom_voice — Permanently delete (irreversible).

Usage & credits

  • get_credit_balance — Remaining credits.
  • get_usage_history — Usage over a time window.
  • get_voice_usage — Usage for a specific voice.

Breaking changes & migration (0.2.0)

0.2.0 moves behavior control out of environment variables and into per-call tool parameters — so the LLM decides per request, not the server config.

Before (env var)After (per-call parameter)Note
SUPERTONE_MCP_OUTPUT_MODE=files|resources|bothtext_to_speech(output_mode=...)Default still files
SUPERTONE_MCP_AUTOPLAY=truetext_to_speech(autoplay=...)Default changed truefalse (playback is now explicit)
(always streamed)text_to_speech(streaming=...)New, default false (one-shot). streaming=true requires model="sona_speech_1"

Other changes:

  • Default model changed sona_speech_1sona_speech_2_flash.
  • list_voices was removed (since the discovery release) and replaced by search_voice — call it with no arguments to reproduce the old "list everything" behavior.
  • No more hard 300-character limit — longer text is auto-chunked by the SDK (credit/latency scale with length).

If you previously set SUPERTONE_MCP_OUTPUT_MODE or SUPERTONE_MCP_AUTOPLAY, remove them from your client config and pass output_mode / autoplay per call instead. (The server prints a one-time stderr notice if it sees the removed vars.)

Installation

# Using uvx (recommended)
uvx supertone-mcp

# Using pip
pip install supertone-mcp

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "supertone-tts": {
      "command": "uvx",
      "args": ["supertone-mcp"],
      "env": {
        "SUPERTONE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Cursor

Add to your Cursor MCP settings (same JSON shape as above).

Environment Variables

Only authentication and stable defaults are configured via the environment — all behavior is controlled per call.

VariableRequiredDefaultDescription
SUPERTONE_API_KEYYesYour Supertone API key
SUPERTONE_MCP_VOICE_IDNopreset voice (Aiden, multilingual)Default voice_id for text_to_speech / predict_duration (override per call)
SUPERTONE_OUTPUT_DIRNo~/supertone-tts-output/Directory where audio files are saved (used by output_mode=files/both)

Removed in 0.2.0: SUPERTONE_MCP_OUTPUT_MODE and SUPERTONE_MCP_AUTOPLAY — see Migration.

Output modes (text_to_speech output_mode)

ModeReturnsUse when
files (default)Plain text with the saved file path + metadataYou want the file on disk
resourcesMCP AudioContent + TextContent (no file written)The client renders audio inline (e.g., Claude.ai chat)
bothFile on disk and AudioContent/TextContentYou want both — preview inline, keep the file

Usage Examples

The MCP client routes natural-language requests across these tools — the value of the toolkit is composition: the LLM chains several tools to satisfy one request.

Example 1 — Discover → preview → estimate cost → synthesize

"Find a calm Korean female voice, let me hear a sample, check the cost, then make this announcement as an mp3."

The LLM assembles:

search_voice(language="ko", gender="female", style="neutral")   # find candidates
  → preview_voice(voice_id)                                       # sample URLs to confirm the voice
  → predict_duration(text, voice_id) + get_credit_balance()       # gauge cost before spending
  → text_to_speech(text, voice_id, output_format="mp3",
                   output_mode="files")                           # synthesize

Example 2 — Clone my voice → use it right away

"Make a cloned voice from ~/recordings/sample.wav named MyVoice, then read this greeting with it and play it for me."

The LLM assembles:

clone_voice(name="MyVoice", audio_path="~/recordings/sample.wav")   # create the cloned voice
  → get_custom_voice(voice_id)                                       # confirm it was created
  → text_to_speech(text, voice_id=<cloned>, autoplay=true)           # synthesize, then play immediately

autoplay is a per-call parameter (default false), so playback happens only when explicitly requested.

Tool Parameters

text_to_speech

ParameterTypeRequiredDefaultDescription
textstringYesText to convert (long text is auto-chunked by the SDK)
voice_idstringNoenv or presetVoice identifier (browse via search_voice)
languagestringNokoLanguage code — one of 31 (ko, en, ja, …)
output_formatstringNomp3mp3 or wav
modelstringNosona_speech_2_flashsona_speech_1, sona_speech_2, sona_speech_2_flash, sona_speech_2t, sona_speech_3t, supertonic_api_1, supertonic_api_3
speedfloatNo1.00.5–2.0
pitch_shiftintNo0-24 to +24 semitones
stylestringNoEmotion style (varies by voice)
output_modestringNofilesfiles, resources, or both (see Output modes)
autoplayboolNofalsePlay the audio locally after synthesis (macOS afplay)
streamingboolNofalseStream synthesis. Only supported by model="sona_speech_1"
include_phonemesboolNofalseReturn phoneme timing data alongside the audio
normalized_textstringNoPre-normalized text (only used by sona_speech_2 / sona_speech_2_flash)

predict_duration

Same core parameter schema as text_to_speech (long text auto-chunked). Returns "Predicted duration: 2.34s (credit usage is proportional to duration).".

search_voice

All parameters optional. With no filters → full catalog. With any filter → first response line is Filters applied: ....

ParameterTypeDescription
languagestringe.g., ko, en, ja
genderstringe.g., male, female
agestringe.g., young_adult, child
use_casestringe.g., narration, advertisement
stylestringe.g., neutral, happy
modelstringe.g., sona_speech_2_flash
namestringpartial match
descriptionstringpartial match

get_voice / preview_voice

ToolRequiredOptional
get_voicevoice_id
preview_voicevoice_idlanguage, style, model (filter samples)

clone_voice

ParameterTypeRequiredDescription
namestringYesDisplay name (non-empty)
audio_pathstringYesLocal WAV or MP3 path (≤3MB). Supports ~ expansion
descriptionstringNoOptional note

Custom voice CRUD

ToolRequiredOptional
search_custom_voicename, description (partial match)
get_custom_voicevoice_id
edit_custom_voicevoice_idname, description (at least one required)
delete_custom_voicevoice_id(IRREVERSIBLE)

Usage & credits

ToolRequiredOptional
get_credit_balance
get_usage_history— (reports a recent default window)
get_voice_usagevoice_id

Development

# Clone and install
git clone https://github.com/supertone-inc/supertone-mcp.git
cd supertone-mcp
uv sync

# Run tests
uv run pytest -q

# Run with coverage
uv run pytest --cov=src --cov-report=term-missing

License

MIT

Server Config

{
  "mcpServers": {
    "supertone-tts": {
      "command": "uvx",
      "args": [
        "supertone-mcp"
      ],
      "env": {
        "SUPERTONE_API_KEY": "your-api-key-here"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
WindsurfThe new purpose-built IDE to harness magic
ChatWiseThe second fastest AI chatbot™
Tavily Mcp
Serper MCP ServerA Serper MCP Server
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
CursorThe AI Code Editor
Playwright McpPlaywright MCP server
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Amap Maps高德地图官方 MCP Server
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
DeepChatYour AI Partner on Desktop
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.