Sponsored by Deepsite.site

Obsidian Index Service

Created By
pmmvr9 months ago
A service that monitors an Obsidian vault for file changes (new, modified, or deleted Markdown files) and indexes each note’s metadata and content into an SQLite database, exposing it for e.g. an MCP-server
Content

Obsidian Index Service

This service monitors an Obsidian vault directory and indexes Markdown files—metadata and full content—into an SQLite database. I built it to work with my mcp-server project, but switched to an implementation that uses the Obsidian plugin API instead. I still see use for this as an agnostic note indexer or sync tool (see note under "Future Steps"), so I'm putting it up here.

Functionality

It tracks file changes (create, modify, delete) in an Obsidian vault and stores everything in SQLite, accessible via a Docker volume. It captures:

  • Path: File path (unique identifier)
  • Title: From filename
  • Parent Folders: Relative to vault root
  • Tags: From YAML frontmatter
  • Created Date: Filesystem timestamp
  • Modified Date: Filesystem timestamp
  • Content: Full text of the note
  • Status: Processing outcome (success/error)
  • Error Message: Details if processing fails

Setup and Usage

Prerequisites

  • Python 3.12 or later
  • Docker and Docker Compose (for containerized use)
  • uv (optional, but recommended)

Installation

  1. Clone the repo:

     git clone https://github.com/pmmvr/obsidian-index-service.git
     cd obsidian-index-service
    
  2. Set up a virtual environment:

    • With uv (recommended):

      uv venv
      source .venv/bin/activate # Linux/macOS
      .venv\Scripts\activate # Windows
      
    • With python (standard):

      python -m venv .venv
      source .venv/bin/activate  # Linux/macOS
      .venv\Scripts\activate     # Windows
      
  3. Install dependencies:

    • With uv (recommended):
      uv sync  # Installs from uv.lock
      uv pip install pytest pytest-bdd pytest-mock  # For tests
      
    • With pip:
      pip install -e .
      pip install pytest pytest-bdd pytest-mock  # For tests
      

Running Locally

Set environment variables:

export OBSIDIAN_VAULT_PATH=/path/to/vault
export DB_PATH=/path/to/notes.sqlite

Run it:

python main.py

With uv:

uv run python main.py

For a one-time scan:

python main.py --scan-only

Or with uv:

uv run python main.py --scan-only

Command-Line Options

  • --vault-path: Path to vault directory
  • --db-path: Path to SQLite database
  • --scan-only: Scan without watching

Using Docker

  1. Build and run:
    docker-compose up -d
    
  2. It mounts your vault and exposes the SQLite database.

Read-Only Access for Other Services

To let another service read the database (e.g., for scanning changes):

  1. Use the same volume as obsidian-index-service in your docker-compose.yml:
    services:
      your-service:
        image: your-image
        volumes:
          - ${DB_VOLUME_PATH:-./data}:/data:ro  # Read-only mount
    
  2. Obsidian Index Service writes to /data/notes.sqlite (mounted read-write), while other services (e.g. an mcp-server) read it. SQLite's WAL mode handles concurrent access.

How It Works

The Obsidian Index Service operates through the following process:

  1. Startup (ObsidianIndexService.__init__)

    • Loads configuration from environment variables or command-line arguments
    • Initializes the database connection (DatabaseConnection)
    • Sets up the note processor (NoteProcessor)
    • Establishes signal handlers for graceful shutdown
  2. Database Initialization (DatabaseConnection.__init__)

    • Creates/connects to an SQLite database
    • Sets up the database in WAL (Write-Ahead Logging) mode for better concurrency
    • Creates a 'notes' table if it doesn't exist with columns for path, title, tags, etc.
  3. Initial Vault Scan (NoteProcessor.scan_vault)

    • Finds all Markdown files (*.md, *.markdown) in the vault directory
    • For each file, extracts metadata
    • Adds all extracted metadata to the database (NoteOperations.insert_note)
  4. Continuous Monitoring (FileWatcher.watch)

    • Watches the vault directory for file system events
    • Processes different types of events:
      • File creation: Indexes new files
      • File modification: Updates index for changed files
      • File deletion: Removes entries from the index
      • File movement/renaming: Updates path information
  5. File Processing (NoteProcessor.process_note)

    • Extracts metadata from Markdown files
    • Includes path, title, parent folders, tags, created/modified dates
    • Updates the database with this information (NoteOperations.upsert_note)
  6. Graceful Shutdown (ObsidianIndexService.shutdown)

    • Properly closes file watchers and database connections when receiving termination signals

The service operates in the background, continuously keeping the SQLite database in sync with the Obsidian vault. Other applications can then use this database to access note metadata without having to parse Markdown files directly.

Development

Run tests:

pytest

Project Status

  • Done: Core indexing (metadata + content), Docker setup, file watching, database CRUD.
  • Next: Planned an API, but went with the plugin approach instead.

Future Steps (Sync Tool Potential)

With some rework I can see this as a sync tool:

  • Remote Backend: Add support for cloud storage (e.g., Dropbox) or a server.
  • Sync Logic: Push local changes (content + metadata) to remote, pull remote updates, handle conflicts (e.g., last-write-wins).
  • Database Tweaks: Add sync_status and remote_id columns.
  • File Watcher Updates: Queue changes for sync, not just indexing.
  • CLI Option: Add --sync to trigger it manually or continuously.
  • Error Handling: Retry on network fails, log issues.
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
DeepChatYour AI Partner on Desktop
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
ChatWiseThe second fastest AI chatbot™
Amap Maps高德地图官方 MCP Server
Playwright McpPlaywright MCP server
Tavily Mcp
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
WindsurfThe new purpose-built IDE to harness magic
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
CursorThe AI Code Editor
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Serper MCP ServerA Serper MCP Server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.