Cantonese.ai MCP Server

Created By

hhy-joseph8 months ago

Content

Cantonese.ai MCP Server

An MCP (Model Context Protocol) server that provides tools for text-to-speech and speech-to-text conversion using the cantonese.ai API. This server is designed to be run with mcp dev.

✨ Features

Text-to-Speech Tool: Convert Cantonese or English text into high-quality audio.
Speech-to-Text Tool: Transcribe an audio file into text.
Modern Tooling: Set up with uv for fast package management.
Easy Integration: Connects with any MCP-compatible client (e.g., an LLM agent).
Secure: Your cantonese.ai API key is handled securely as an environment variable.

🚀 Getting Started

Prerequisites

Python 3.8+
uv: We recommend using uv for Python package management.

Installation

Clone the repository:
```
git clone 
cd cantonese-ai-mcp-server
```
Create and activate a virtual environment:
```
uv venv
source .venv/bin/activate
```
Install the dependencies: This project uses uv to sync dependencies from pyproject.toml.
```
uv sync
```
Set up your API Key: You'll need an API key from cantonese.ai. Export your API key as an environment variable. You can add this to your .bashrc or .zshrc file for persistence.
```
export CANTONESE_AI_API_KEY="your-api-key-here"
```

Running the Server

Start the MCP development server using the following command. It will watch for changes in server.py and automatically reload.

uv run mcp dev server.py

You should see an output indicating that the server has started and is available, typically at http://127.0.0.1:6274.

Running the Server and use in Claude Desktop

uv run server.py

Please view For Server Developers on how to set up connection with Cladue Desktop.

🛠️ Using the Tools

Once the server is running, it will expose two tools.

Tool: `text_to_speech`

Converts a string of text into an audio file.

Arguments:

-text (string, required): The text to be converted to speech. -voice (string, optional, default: "default"): The voice to use for the speech synthesis. -language (string, optional, default: "cantonese"): The language of the text. Can be "cantonese" or "english". -output_filename (string, required): The name of the file to save the audio to (e.g., output.mp3).

Example Invocation:

{
  "tool": "text_to_speech",
  "arguments": {
    "text": "你好世界",
    "output_filename": "hello_world.mp3"
  }
}

Successful Response:

{
  "success": true,
  "message": "Audio file saved as hello_world.mp3"
}

Tool: `speech_to_text`

Transcribes an audio file into text.

Arguments:

input_filename (string, required): The path to the local audio file to be transcribed (e.g., audio.wav).

Example Invocation:

{
  "tool": "speech_to_text",
  "arguments": {
    "input_filename": "audio.wav"
  }
}

Successful Response:

The tool will return a JSON object with the transcription details from the API.

{
  "success": true,
  "result": {
    "text": "你好世界",
    "confidence": 0.95,
    "language": "cantonese",
    "duration": 2.3,
    "timestamp": "2025-06-02T11:22:00Z"
  }
}