Sponsored by Deepsite.site

Gossiphs

Created By
williamfzc8 months ago
"Zero setup" & "Blazingly fast" general code file relationship analysis. With Python & Rust. Based on tree-sitter and git analysis. Support MCP and ready for AI🤖
Content

Gossiphs = Gossip Graphs

TIP

We provide an easy-to-use Python SDK and support for MCP (Model Context Protocol), allowing you to seamlessly integrate it with your AI.

See Gossiphs MCP Server

Crates.io Version RealWorld Test

"Zero setup" & "Blazingly fast" general code file relationship analysis. With Python & Rust. Based on tree-sitter and git analysis. Support MCP and ready for AI🤖

What's it

Gossiphs can analyze the history of commits and the relationships between variable declarations and references in your codebase to obtain a relationship diagram of the code files.

It also allows developers to query the content declared in each file, thereby enabling free search for its references throughout the entire codebase to achieve more complex analysis.

graph TD
    A[main.py] --- S1[func_main] --- B[module_a.py]
    A --- S2[Handler] --- C[module_b.py]
    B --- S3[func_util] --- D[utils.py]
    C --- S3[func_util] --- D
    A --- S4[func_init] --- E[module_c.py]
    E --- S5[process] --- F[module_d.py]
    E --- S6[Processor] --- H[module_e.py]
    H --- S7[transform] --- I[module_f.py]
    I --- S3[func_util] --- D

Supported Languages

We are expanding language support based on Tree-Sitter Query, which isn't too costly. If you're interested, you can check out the contribution section.

LanguageStatus
Rust
Python
TypeScript
JavaScript
Golang
Java
Kotlin
Swift

You can see the rule files here.

Usage

Python

pip install gossiphs

Analyze your codebase with networkx within 30 lines:

import networkx as nx
from gossiphs import GraphConfig, create_graph, Graph

config = GraphConfig()
config.project_path = "../.."
graph: Graph = create_graph(config)

nx_graph = nx.DiGraph()

for each_file in graph.files():
    nx_graph.add_node(each_file, metadata=graph.file_metadata(each_file))

    related_files = graph.related_files(each_file)
    for each_related_file in related_files:
        related_symbols = set(each_symbol.symbol.name for each_symbol in each_related_file.related_symbols)

        nx_graph.add_edge(
            each_file,
            each_related_file.name,
            related_symbols=list(related_symbols)
        )

print(f"NetworkX graph created with {nx_graph.number_of_nodes()} nodes and {nx_graph.number_of_edges()} edges.")

for src, dest, data in nx_graph.edges(data=True):
    print(f"{src} -> {dest}, related symbols: {data['related_symbols']}")

Output:

NetworkX graph created with 13 nodes and 27 edges.
src/server.rs -> src/main.rs, related symbols: ['server_main']
src/main.rs -> src/graph.rs, related symbols: ['default']
src/main.rs -> examples/mini.rs, related symbols: ['default']
src/main.rs -> src/server.rs, related symbols: ['main']
src/symbol.rs -> src/graph.rs, related symbols: ['link_file_to_symbol', 'list_references', 'list_references_by_definition', 'id', 'enhance_symbol_to_symbol', 'add_file', 'add_symbol', 'list_definitions', 'list_symbols', 'new', 'link_symbol_to_symbol', 'get_symbol']
...

More examples can be found here.

Others

We also provide a CLI and additional usage options, making it easy to directly export CSV files or start an HTTP service.

See usage page.

Goal & Motivation

TIP

Create a file relationship index with:

  • low cost
  • acceptable accuracy
  • high versatility for nearly any code repository

Code navigation is a fascinating subject that plays a pivotal role in various domains, such as:

  • Guiding the context during the development process within an IDE.
  • Facilitating more convenient code browsing on websites.
  • Analyzing the impact of code changes in Continuous Integration (CI) systems.
  • ...

In the past, I endeavored to apply LSP/LSIF technologies and techniques like Github's Stack-Graphs to impact analysis, encountering different challenges along the way. For our needs, a method akin to Stack-Graphs aligns most closely with our expectations. However, the challenges are evident: it requires crafting highly language-specific rules, which is a considerable investment for us, given that we do not require such high precision data.

We attempt to make some trade-offs on the challenges currently faced by stack-graphs to achieve our expected goals to a certain extent:

  • Zero repo-specific configuration: It can be applied to most languages and repositories without additional configuration.
  • Low extension cost: adding rules for languages is not high.
  • Acceptable precision: We have sacrificed a certain level of precision, but we also hope that it remains at an acceptable level.

How it works

Gossiphs constructs a graph that interconnects symbols of definitions and references.

  1. Extract imports and exports: Identify the imports and exports of each file.
  2. Connect nodes: Establish connections between potential definition and reference nodes.
  3. Refine edges with commit histories: Utilize commit histories to refine the relationships between nodes.

Unlike stack-graphs, we have omitted the highly complex scope analysis and instead opted to refine our edges using commit histories. This approach significantly reduces the complexity of rule writing, as the rules only need to specify which types of symbols should be exported or imported for each file.

While there is undoubtedly a trade-off in precision, the benefits are clear:

  1. Minimal impact on accuracy: In practical scenarios, the loss of precision is not as significant as one might expect.
  2. Commit history relevance: The use of commit history to reflect the influence between code segments aligns well with our objectives.
  3. Language support: We can easily support the vast majority of programming languages, meeting the analysis needs of various types of repositories.

Precision

Static analysis has its limits, such as dynamic binding. Therefore, it is unlikely to achieve the level of accuracy provided by LSP, but it can offer sufficient accuracy in the areas where it is primarily used.

The method we use to demonstrate accuracy is to compare the results with those of LSP/LSIF. It must be admitted that static inference is almost impossible to obtain all reference relationships like LSP.

You can further combine your own needs and use other methods such as tfidf to process the results to meet more complex requirements.

RepoCoverage of LSP Edges by Gossiphs
https://github.com/go-gorm/gorm442/499 = 88.5 %
https://github.com/gin-gonic/gin238/252 = 94.4%

Contribution

The project is still in a very early and experimental stage. If you are interested, please leave your thoughts through an issue. In the short term, we hope to build better support for more languages.

You just need to:

  1. Edit rules in src/rule.rs
  2. Test it in src/extractor.rs
  3. Try it with your repo in src/graph.rs

Tree-sitter Playground is a good helper.

License

Apache 2.0

Server Config

{
  "mcpServers": {
    "gossiphs": {
      "url": "http://127.0.0.1:8000/sse"
    },
    "gossiph-cmd": {
      "command": "gossiphs-mcp",
      "args": [
        "server"
      ]
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Amap Maps高德地图官方 MCP Server
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
DeepChatYour AI Partner on Desktop
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
ChatWiseThe second fastest AI chatbot™
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Tavily Mcp
WindsurfThe new purpose-built IDE to harness magic
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Playwright McpPlaywright MCP server
CursorThe AI Code Editor
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Serper MCP ServerA Serper MCP Server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code