Sponsored by Deepsite.site

Oboyu (覚ゆ)

Created By
sonesuke7 months ago
Self-hosted MCP Japanese text indexing & search—chunking+embeddings with BM25×vector rerank
Content

Oboyu (覚ゆ)

License: MIT Python Version PyPI Version

Lightning-fast semantic search for your local documents with best-in-class Japanese support.

demo

What is Oboyu?

Oboyu (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.

Why Oboyu?

  • 🚀 Fast: Indexes thousands of documents in seconds, searches in milliseconds
  • 🎯 Accurate: Semantic search finds what you mean, not just what you type
  • 🇯🇵 Japanese Excellence: First-class support with automatic encoding detection
  • 🔒 Private: Everything runs locally - your documents never leave your machine
  • 🤖 AI-Ready: Built-in MCP server for Claude, Cursor, and other AI assistants

Quick Start

Get up and running in under 5 minutes:

# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search interactively
oboyu query --interactive

That's it! See our Quick Start Guide for more examples.

Key Features

🔍 Advanced Search Capabilities

  • Hybrid Search: Combines semantic understanding with keyword matching for best results
  • Multiple Modes: Switch between semantic, keyword, or hybrid search modes
  • Smart Reranking: Built-in AI reranker improves result accuracy
  • Interactive Mode: Real-time search with command history and auto-suggestions

📚 Document Support

  • Wide Format Support: Plain text, Markdown, code files, PDFs, Jupyter notebooks, and more
  • Incremental Indexing: Only process new or changed files for lightning-fast updates
  • Smart Chunking: Intelligent document splitting for optimal search results
  • Automatic Encoding: Handles various text encodings seamlessly

🇯🇵 Japanese Language Excellence

  • Native Support: Purpose-built for Japanese text processing
  • Automatic Detection: Detects and handles Shift-JIS, EUC-JP, and UTF-8
  • Specialized Models: Optimized embedding models for Japanese content
  • Mixed Language: Seamlessly handles Japanese and English in the same document

🚀 Performance & Integration

  • ONNX Acceleration: 2-4x faster with automatic model optimization
  • MCP Server: Direct integration with Claude Desktop and AI coding assistants
  • Rich CLI: Beautiful terminal interface with progress tracking
  • Low Memory: Efficient processing even on modest hardware

Installation

uv tool install oboyu

Using pip

pip install oboyu

From Source

git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .

System Requirements

  • Python: 3.10 or higher
  • OS: macOS, Linux (Windows via WSL)
  • Memory: 2GB RAM minimum
  • Storage: 1GB for models and index

Note: Models are automatically downloaded on first use (~90MB).

Usage Examples

Basic Usage

# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu query "machine learning optimization techniques"

# Interactive mode (recommended!)
oboyu query --interactive

Advanced Examples

# Index only specific file types
oboyu index ~/projects --include "*.md,*.txt"

# Search with filters
oboyu query "API design" --filter "docs/"

# Use semantic search mode
oboyu query "concepts similar to dependency injection" --mode semantic

# Enable reranking for better accuracy
oboyu query "complex technical topic" --rerank

MCP Server for AI Assistants

# Start MCP server
oboyu mcp

# Or configure in Claude Desktop's settings

See our MCP Integration Guide for detailed setup instructions.

Documentation

🚀 Getting Started

📖 User Guides

🔧 Technical Documentation

Common Use Cases

📚 Academic Research

Index and search through research papers, notes, and references:

oboyu index ~/research --include "*.pdf,*.md,*.txt"
oboyu query "transformer architecture improvements"

💻 Code Documentation

Search through project documentation and code comments:

oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu query "authentication implementation"

📝 Personal Knowledge Base

Organize and search your notes and documents:

oboyu index ~/Documents/notes
oboyu query "meeting notes from last week"

🌏 Multilingual Documents

Perfect for mixed Japanese and English content:

oboyu index ~/Documents/bilingual
oboyu query "プロジェクト管理 best practices"

Testing

Unit and Integration Tests

# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src

E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

# Run all E2E display tests
python tests/e2e/run_display_tests.py

# Run specific test category
python tests/e2e/run_display_tests.py --test search

See our E2E Display Testing Guide for details.

Contributing

We welcome contributions! See our Contributing Guidelines for details.

# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"

Support

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
  • Built with ❤️ for the Japanese NLP community
  • Inspired by the goal of making knowledge accessible across languages

Made with 🇯🇵 by sonesuke

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
DeepChatYour AI Partner on Desktop
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright McpPlaywright MCP server
WindsurfThe new purpose-built IDE to harness magic
Tavily Mcp
ChatWiseThe second fastest AI chatbot™
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Serper MCP ServerA Serper MCP Server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Amap Maps高德地图官方 MCP Server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
CursorThe AI Code Editor