FS-MCP: Universal File Reader & Intelligent Search MCP Server

Created By

boleyn8 months ago

文件资料查找mcp服务

Content

FS-MCP: Universal File Reader & Intelligent Search MCP Server

A powerful MCP (Model Context Protocol) server that provides intelligent file reading and semantic search capabilities

English | 中文

English

🚀 Features

🧠 Intelligent Text Detection: Automatically identifies text files without relying on file extensions
📄 Multi-Format Support: Handles text files and document formats (Word, Excel, PDF, etc.)
🔒 Security First: Restricted access to configured safe directories only
📏 Range Reading: Supports reading specific line ranges for large files
🔄 Document Conversion: Automatic conversion of documents to Markdown with caching
🔍 Vector Search: Semantic search powered by AI embeddings
⚡ High Performance: Batch processing and intelligent caching support
🌐 Multi-language: Supports both English and Chinese content

🚀 Quick Start

1. Clone and Install

git clone https://github.com/yourusername/fs-mcp.git
cd fs-mcp

Using uv (Recommended):

uv sync

Using pip:

pip install -r requirements.txt  # If you have a requirements.txt
# OR install directly
pip install fastmcp>=2.0.0 langchain>=0.3.0 python-dotenv>=1.1.0

2. Environment Configuration

Create a .env file in the project root:

# Security Settings
SAFE_DIRECTORY=.                    # Directory restriction (required)
MAX_FILE_SIZE_MB=100                # File size limit in MB

# Encoding Settings
DEFAULT_ENCODING=utf-8

# AI Embeddings Configuration (for vector search)
OPENAI_EMBEDDINGS_API_KEY=your-api-key
OPENAI_EMBEDDINGS_BASE_URL=http://your-embedding-service/v1
EMBEDDING_MODEL_NAME=BAAI/bge-m3    # Or your preferred model
EMBEDDING_CHUNK_SIZE=1000

3. Start the Server

python main.py

The server will start on http://localhost:3002 and automatically build the vector index.

🛠️ Installation

System Requirements

Python: 3.12 or higher
OS: Windows, macOS, Linux
Memory: 4GB+ recommended for vector search
Storage: 1GB+ for caching and indexes

Dependencies

Core dependencies are managed in pyproject.toml:

fastmcp>=2.0.0 - MCP server framework
langchain>=0.3.0 - AI and vector search
python-dotenv>=1.1.0 - Environment management
Document processing libraries (pandas, openpyxl, python-docx, etc.)

⚙️ Configuration

Environment Variables

Variable	Default	Description
`SAFE_DIRECTORY`	`.`	Root directory for file access
`MAX_FILE_SIZE_MB`	`100`	Maximum file size limit
`DEFAULT_ENCODING`	`utf-8`	Default file encoding
`OPENAI_EMBEDDINGS_API_KEY`	-	API key for embedding service
`OPENAI_EMBEDDINGS_BASE_URL`	-	Embedding service URL
`EMBEDDING_MODEL_NAME`	`BAAI/bge-m3`	AI model for embeddings
`EMBEDDING_CHUNK_SIZE`	`1000`	Text chunk size for processing

Advanced Configuration

For production deployments, consider:

Setting up rate limiting
Configuring log rotation
Using external vector databases
Setting up monitoring

🔧 MCP Tools

1. `view_directory_tree`

Purpose: Display directory structure in tree format

view_directory_tree(
    directory_path=".",     # Target directory
    max_depth=3,           # Maximum depth
    max_entries=300        # Maximum entries to show
)

2. `read_file_content`

Purpose: Read file content with line range support

read_file_content(
    file_path="example.py",  # File path
    start_line=1,           # Start line (optional)
    end_line=50             # End line (optional)
)

3. `search_documents`

Purpose: Intelligent semantic search across documents

search_documents(
    query="authentication logic",     # Search query
    search_type="semantic",          # semantic/filename/hybrid/extension
    file_extensions=".py,.js",       # File type filter (optional)
    max_results=10                   # Maximum results
)

4. `rebuild_document_index`

Purpose: Rebuild vector index for search

rebuild_document_index()  # No parameters needed

5. `get_document_stats`

Purpose: Get index statistics and system status

get_document_stats()  # Returns comprehensive stats

6. `list_files`

Purpose: List files in directory with pattern matching

list_files(
    directory_path="./src",  # Directory to list
    pattern="*.py",         # File pattern
    include_size=True       # Include file sizes
)

7. `preview_file`

Purpose: Quick preview of file content

preview_file(
    file_path="example.py",  # File to preview
    lines=20                # Number of lines
)

🔍 Vector Search

Capabilities

Semantic Understanding: Search "user authentication" finds "login verification" code
Synonym Recognition: Search "database" finds "数据库" (Chinese) content
Multi-language Support: Handles English, Chinese, and mixed content
Context Awareness: Understands code semantics and relationships

Search Types

Semantic Search (semantic): AI-powered understanding
Filename Search (filename): Fast filename matching
Extension Search (extension): Filter by file type
Hybrid Search (hybrid): Combines semantic + filename

Technical Stack

Embedding Model: BAAI/bge-m3 (1024-dimensional vectors)
Vector Database: ChromaDB
Text Splitting: Intelligent semantic chunking
Incremental Updates: Hash-based change detection

📁 Supported Formats

Auto-detected Text Files

Programming languages: .py, .js, .ts, .java, .cpp, .c, .go, .rs, etc.
Config files: .json, .yaml, .toml, .ini, .xml, .env
Documentation: .md, .txt, .rst
Web files: .html, .css, .scss
Data files: .csv, .tsv
Files without extensions (auto-detected)

Document Formats (Auto-converted to Markdown)

Microsoft Office: .docx, .xlsx, .pptx
OpenDocument: .odt, .ods, .odp
PDF: .pdf (text extraction)
Legacy formats: .doc, .xls (limited support)

🔒 Security Features

Access Control

Directory Restriction: Access limited to SAFE_DIRECTORY and subdirectories
Path Traversal Protection: Automatic prevention of ../ attacks
Symlink Control: Configurable symbolic link access
File Size Limits: Prevents reading oversized files

Validation

Path Sanitization: Automatic path cleaning and validation
Permission Checks: Verify read permissions before access
Error Handling: Graceful failure with informative messages

🔗 Integration

Claude Desktop

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "fs-mcp": {
      "command": "python",
      "args": ["main.py"],
      "cwd": "/path/to/fs-mcp",
      "env": {
        "SAFE_DIRECTORY": "/your/project/directory"
      }
    }
  }
}

Other MCP Clients

Connect to http://localhost:3002 using Server-Sent Events (SSE) protocol.

API Integration

The server exposes standard MCP endpoints that can be integrated with any MCP-compatible client.

🏗️ Project Structure

fs-mcp/
├── main.py                    # Main MCP server
├── src/                       # Core modules
│   ├── __init__.py           # Package initialization
│   ├── file_reader.py        # Core file reading logic
│   ├── security_validator.py # Security and validation
│   ├── text_detector.py      # Intelligent file detection
│   ├── config_manager.py     # Configuration management
│   ├── document_cache.py     # Document caching system
│   ├── file_converters.py    # Document format converters
│   ├── dir_tree.py          # Directory tree generation
│   ├── embedding_config.py   # AI embedding configuration
│   ├── codebase_indexer.py   # Vector indexing system
│   ├── codebase_search.py    # Search engine
│   ├── index_scheduler.py    # Index scheduling
│   └── progress_bar.py       # Progress display utilities
├── tests/                    # Test suite
├── cache/                    # Document cache (auto-created)
├── logs/                     # Log files (auto-created)
├── pyproject.toml           # Project configuration
├── .env.example             # Environment template
├── .gitignore              # Git ignore rules
└── README.md               # This file

💻 Development

Setting Up Development Environment

# Clone repository
git clone https://github.com/yourusername/fs-mcp.git
cd fs-mcp

# Install with development dependencies
uv sync --group dev

# OR with pip
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test
pytest tests/test_file_reader.py

Code Quality

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Debugging

Monitor logs in real-time:

tail -f logs/mcp_server_$(date +%Y%m%d).log

🤝 Contributing

We welcome contributions! Here's how to get started:

1. Fork and Clone

git clone https://github.com/yourusername/fs-mcp.git
cd fs-mcp

2. Create Feature Branch

git checkout -b feature/your-feature-name

3. Make Changes

Follow the existing code style
Add tests for new functionality
Update documentation as needed

4. Test Your Changes

pytest
black src/ tests/
flake8 src/ tests/

5. Submit Pull Request

Describe your changes clearly
Reference any related issues
Ensure all tests pass

Development Guidelines

Code Style: Follow PEP 8, use Black for formatting
Testing: Maintain test coverage above 80%
Documentation: Update README and docstrings
Commits: Use conventional commit messages
Security: Follow security best practices

📋 Roadmap

Enhanced PDF Processing: Better table and image extraction
More Embedding Models: Support for local models
Real-time Indexing: File system watchers
Advanced Search: Regex, proximity, faceted search
Performance Optimization: Async processing, caching improvements
Web Interface: Optional web UI for management
Plugin System: Custom file type handlers
Enterprise Features: Authentication, rate limiting, monitoring

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

FastMCP - MCP server framework
LangChain - AI integration
ChromaDB - Vector database
BGE-M3 - Embedding model

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Check the docs/ folder (when available)

中文

🚀 功能特点

🧠 智能文本检测: 无需依赖扩展名，自动识别文本文件
📄 多格式支持: 支持文本文件和文档格式（Word、Excel、PDF等）
🔒 安全验证: 只允许读取配置的安全目录中的文件
📏 按行读取: 支持指定行范围读取，便于处理大文件
🔄 文档转换: 自动将文档格式转换为Markdown并缓存
🔍 向量搜索: 基于AI嵌入的语义搜索
⚡ 高性能: 支持批量文件处理和智能缓存
🌐 多语言: 支持中英文内容处理

🚀 快速开始

1. 克隆和安装

git clone https://github.com/yourusername/fs-mcp.git
cd fs-mcp

# 推荐使用 uv
uv sync

# 或使用 pip
pip install -r requirements.txt

2. 环境配置

创建 .env 文件：

# 安全设置
SAFE_DIRECTORY=.                    # 目录访问限制（必需）
MAX_FILE_SIZE_MB=100                # 文件大小限制（MB）

# 编码设置
DEFAULT_ENCODING=utf-8

# AI嵌入配置（用于向量搜索）
OPENAI_EMBEDDINGS_API_KEY=your-api-key
OPENAI_EMBEDDINGS_BASE_URL=http://your-embedding-service/v1
EMBEDDING_MODEL_NAME=BAAI/bge-m3    # 或您偏好的模型
EMBEDDING_CHUNK_SIZE=1000

3. 启动服务器

python main.py

服务器将在 http://localhost:3002 启动并自动建立向量索引。

🛠️ MCP工具说明

详细的工具使用方法请参考英文部分的 MCP Tools 章节。

🔍 向量搜索功能

概念匹配：搜索"用户认证"能找到"登录验证"相关代码
同义词理解：搜索"database"能找到"数据库"相关内容
多语言支持：同时理解中英文代码和注释
上下文理解：理解代码的语义和上下文关系

📁 支持的文件格式

详细的格式支持请参考英文部分的 Supported Formats 章节。

🔒 安全特性

路径验证: 只允许访问配置的安全目录及其子目录
文件大小限制: 防止读取过大文件
路径遍历防护: 自动防止 ../ 等路径遍历攻击
符号链接控制: 可配置是否允许访问符号链接

🔗 集成方式

Claude Desktop集成

在 Claude Desktop 的 MCP 配置中添加：

{
  "mcpServers": {
    "fs-mcp": {
      "command": "python",
      "args": ["main.py"],
      "cwd": "/path/to/fs-mcp",
      "env": {
        "SAFE_DIRECTORY": "/your/project/directory"
      }
    }
  }
}

💻 开发

开发环境设置

# 克隆仓库
git clone https://github.com/yourusername/fs-mcp.git
cd fs-mcp

# 安装开发依赖
uv sync --group dev

运行测试

# 运行所有测试
pytest

# 运行覆盖率测试
pytest --cov=src

🤝 贡献

欢迎贡献代码！请参考英文部分的 Contributing 章节了解详细信息。

📄 许可证

本项目采用 MIT 许可证 - 详见 LICENSE 文件。

Made with ❤️ for the AI community

⬆ Back to top

Recommend Servers

TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.

Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.

Baidu Map百度地图核心API现已全面兼容MCP协议，是国内首家兼容MCP协议的地图服务商。

DeepChatYour AI Partner on Desktop

Serper MCP ServerA Serper MCP Server

MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

CursorThe AI Code Editor

ChatWiseThe second fastest AI chatbot™

BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.

Playwright McpPlaywright MCP server

Amap Maps高德地图官方 MCP Server

WindsurfThe new purpose-built IDE to harness magic

Howtocook Mcp基于Anduin2017 / HowToCook （程序员在家做饭指南）的mcp server，帮你推荐菜谱、规划膳食，解决“今天吃什么“的世纪难题； Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"

Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code

MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs

Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.

Tavily Mcp

AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.

TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.

EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.

Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.

FS-MCP: Universal File Reader & Intelligent Search MCP Server

FS-MCP: Universal File Reader & Intelligent Search MCP Server

English

🚀 Features

📋 Table of Contents

🚀 Quick Start

1. Clone and Install

2. Environment Configuration

3. Start the Server

🛠️ Installation

System Requirements

Dependencies

⚙️ Configuration

Environment Variables

Advanced Configuration

🔧 MCP Tools

1. view_directory_tree

2. read_file_content

3. search_documents

4. rebuild_document_index

5. get_document_stats

6. list_files

7. preview_file

🔍 Vector Search

Capabilities

Search Types

Technical Stack

📁 Supported Formats

Auto-detected Text Files

Document Formats (Auto-converted to Markdown)

🔒 Security Features

Access Control

Validation

🔗 Integration

Claude Desktop

Other MCP Clients

API Integration

🏗️ Project Structure

💻 Development

Setting Up Development Environment

Running Tests

Code Quality

Debugging

🤝 Contributing

1. Fork and Clone

2. Create Feature Branch

3. Make Changes

4. Test Your Changes

5. Submit Pull Request

Development Guidelines

📋 Roadmap

📄 License

🙏 Acknowledgments

📞 Support

中文

🚀 功能特点

🚀 快速开始

1. 克隆和安装

2. 环境配置

3. 启动服务器

🛠️ MCP工具说明

🔍 向量搜索功能

📁 支持的文件格式

🔒 安全特性

🔗 集成方式

Claude Desktop集成

💻 开发

开发环境设置

运行测试

🤝 贡献

📄 许可证

1. `view_directory_tree`

2. `read_file_content`

3. `search_documents`

4. `rebuild_document_index`

5. `get_document_stats`

6. `list_files`

7. `preview_file`