Sponsored by Deepsite.site

Claude Desktop Real-time Audio MCP

Created By
joelfuller20167 months ago
Real-time microphone input MCP server for Claude Desktop on Windows - enabling live voice conversations with Claude through WASAPI audio capture and real-time speech recognition
Content

Claude Desktop Real-time Audio MCP

License: MIT Node.js Version Windows

A Model Context Protocol (MCP) server that enables real-time microphone input for Claude Desktop on Windows. This project bridges the gap between Claude's conversational AI and live voice input through Windows Audio Session API (WASAPI) integration and real-time speech recognition.

๐Ÿš€ Features

  • Real-time Audio Capture: Low-latency microphone input using Windows WASAPI
  • Multiple Speech-to-Text Engines: Support for OpenAI Whisper, Azure Speech, and Google Speech
  • MCP Integration: Seamless integration with Claude Desktop through the Model Context Protocol
  • Voice Activity Detection: Intelligent silence detection and audio chunking
  • Device Management: Automatic audio device enumeration and selection
  • Cross-format Support: Support for multiple audio formats and sample rates
  • Performance Optimized: Minimal latency for natural conversation flow

๐Ÿ—๏ธ Project Status

๐Ÿšง Under Active Development

This project is currently in the research and development phase. See the Project Roadmap below for detailed milestones and progress tracking.

๐ŸŽฏ Vision

Enable natural, voice-driven conversations with Claude Desktop by providing:

  • Sub-500ms latency from speech to text
  • Robust error handling and graceful degradation
  • Easy installation and configuration
  • Support for multiple audio input sources
  • Extensible architecture for future enhancements

๐Ÿ—บ๏ธ Project Roadmap

Phase 1: Research & Architecture (Target: June 15, 2025)

  • Research Windows WASAPI APIs and real-time audio capture methods
  • Design MCP server architecture for audio streaming
  • Create proof-of-concept WASAPI audio capture in C++
  • Evaluate speech-to-text integration options
  • Set up development environment and toolchain

Phase 2: Core Audio Implementation (Target: July 1, 2025)

  • Implement WASAPI audio capture module in C++
  • Create Node.js FFI bindings for audio module
  • Develop real-time audio buffering and streaming system
  • Implement audio format conversion and processing pipeline
  • Create device enumeration and selection functionality

Phase 3: MCP Server Development (Target: July 20, 2025)

  • Implement MCP server using TypeScript SDK
  • Create audio capture tools for MCP interface
  • Implement speech-to-text integration tools
  • Develop configuration and device management resources
  • Add error handling and graceful shutdown mechanisms

Phase 4: Speech Recognition Integration (Target: August 10, 2025)

  • Integrate OpenAI Whisper for local processing
  • Add Azure Speech Services integration
  • Implement Google Speech-to-Text support
  • Develop real-time transcription with chunking strategies
  • Create voice activity detection and silence handling

Phase 5: Claude Desktop Integration (Target: August 25, 2025)

  • Test integration with Claude Desktop configuration
  • Optimize latency and performance for real-time use
  • Implement user preferences and configuration UI
  • Create installation and setup automation
  • Develop usage examples and demo scenarios

Phase 6: Testing & Documentation (Target: September 15, 2025)

  • Create comprehensive test suite for all components
  • Write detailed installation and usage documentation
  • Develop troubleshooting guides and FAQ
  • Perform security and performance audits
  • Prepare release packages and distribution

๐Ÿ›๏ธ Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Claude        โ”‚    โ”‚  MCP Server      โ”‚    โ”‚  Audio Module   โ”‚
โ”‚   Desktop       โ”‚โ—„โ”€โ”€โ–บโ”‚  (TypeScript)    โ”‚โ—„โ”€โ”€โ–บโ”‚  (C++ WASAPI)   โ”‚
โ”‚                 โ”‚    โ”‚                  โ”‚    โ”‚                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚                         โ”‚
                                โ–ผ                         โ–ผ
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚  Speech-to-Text  โ”‚    โ”‚  Windows Audio  โ”‚
                        โ”‚  Services        โ”‚    โ”‚  System         โ”‚
                        โ”‚  (Whisper/Azure/ โ”‚    โ”‚  (Microphone)   โ”‚
                        โ”‚   Google)        โ”‚    โ”‚                 โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Technology Stack

  • Core MCP Server: TypeScript with @modelcontextprotocol/sdk
  • Audio Capture: C++ with Windows WASAPI
  • Node.js Integration: node-gyp for native module compilation
  • Speech Recognition:
    • OpenAI Whisper (local processing)
    • Azure Speech Services (cloud)
    • Google Speech-to-Text (cloud)
  • Build System: node-gyp, TypeScript compiler
  • Documentation: Markdown with GitHub Pages

๐Ÿ“‹ Prerequisites

  • Windows 10/11 (Windows 7+ with WASAPI support)
  • Node.js 16+ with npm
  • Visual Studio Build Tools (for native compilation)
  • Python 3.8+ (for node-gyp)
  • Git for version control

๐Ÿšฆ Quick Start

Note: This project is under development. Installation instructions will be available with the first release.

# Clone the repository
git clone https://github.com/joelfuller2016/claude-desktop-realtime-audio-mcp.git
cd claude-desktop-realtime-audio-mcp

# Install dependencies
npm install

# Build the project
npm run build

# Configure Claude Desktop
# (Instructions will be provided in setup documentation)

๐Ÿค Contributing

We welcome contributions of all kinds! Whether you want to:

  • ๐Ÿ› Report bugs or issues
  • ๐Ÿ’ก Suggest new features or improvements
  • ๐Ÿ”ง Submit code contributions
  • ๐Ÿ“š Improve documentation
  • ๐Ÿงช Help with testing

Please see our Contributing Guide for detailed information on how to get started.

๐Ÿ“– Research & References

This project builds upon extensive research in:

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Anthropic for Claude and the Model Context Protocol
  • OpenAI for Whisper speech recognition
  • The Node.js and TypeScript communities for excellent tooling
  • Microsoft for comprehensive WASAPI documentation and examples

๐Ÿ“ž Support & Community


โญ Star this repository if you find it interesting or useful!

This project aims to make voice-driven AI conversations more natural and accessible. Join us in building the future of human-AI interaction.

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Howtocook McpๅŸบไบŽAnduin2017 / HowToCook ๏ผˆ็จ‹ๅบๅ‘˜ๅœจๅฎถๅš้ฅญๆŒ‡ๅ—๏ผ‰็š„mcp server๏ผŒๅธฎไฝ ๆŽจ่่œ่ฐฑใ€่ง„ๅˆ’่†ณ้ฃŸ๏ผŒ่งฃๅ†ณโ€œไปŠๅคฉๅƒไป€ไนˆโ€œ็š„ไธ–็บช้šพ้ข˜๏ผ› Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Serper MCP ServerA Serper MCP Server
WindsurfThe new purpose-built IDE to harness magic
Amap Maps้ซ˜ๅพทๅœฐๅ›พๅฎ˜ๆ–น MCP Server
Baidu Map็™พๅบฆๅœฐๅ›พๆ ธๅฟƒAPI็Žฐๅทฒๅ…จ้ขๅ…ผๅฎนMCPๅ่ฎฎ๏ผŒๆ˜ฏๅ›ฝๅ†…้ฆ–ๅฎถๅ…ผๅฎนMCPๅ่ฎฎ็š„ๅœฐๅ›พๆœๅŠกๅ•†ใ€‚
DeepChatYour AI Partner on Desktop
ChatWiseThe second fastest AI chatbotโ„ข
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
Playwright McpPlaywright MCP server
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
CursorThe AI Code Editor
Tavily Mcp
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.