- Entity Identificationn
Entity Identificationn
# EntityIdentification Identify whether two sets of data are from the same entity. 识别两组数据是否来自同一主体
This is a MCP (Model Context Protocol) server. 这是一个支持MCP协议的服务器。
Data Comparison Tool
This tool provides a comprehensive way to compare two sets of data, evaluating both exact and semantic equality of their values. It leverages text normalization and a language model to determine if the data originates from the same entity.
Features
- **Text Normalization**: Converts text to lowercase, removes punctuation, and normalizes whitespace.
- **Value Comparison**: Compares values directly and semantically (ignoring order for lists).
- **JSON Traversal**: Iterates through each key in the JSON objects and compares corresponding values.
- **Language Model Integration**: Uses a generative language model to assess semantic similarity and provide a final judgment on whether the data comes from the same entity.
Installation
To use this tool, ensure you have the necessary dependencies installed. You can install them using pip:
```bash pip install genai ```
Usage
Functions
-
**normalize_text(text)**:
- Normalizes the input text by converting it to lowercase, removing punctuation, and normalizing whitespace.
-
**compare_values(val1, val2)**:
- Compares two values both exactly and semantically.
- If the values are lists, it ignores the order of elements for semantic comparison.
-
**compare_json(json1, json2)**:
- Compares two JSON objects key by key.
- Uses `compare_values` to evaluate each key's values.
- Integrates a language model to assess semantic similarity and provides a final judgment.
Example
```python import json import genai import re
Define your JSON objects
json1 = { "name": "John Doe", "address": "123 Main St, Anytown, USA", "hobbies": ["reading", "hiking", "coding"] }
json2 = { "name": "john doe", "address": "123 Main Street, Anytown, USA", "hobbies": ["coding", "hiking", "reading"] }
Compare the JSON objects
comparison_results = compare_json(json1, json2)
Generate final matching result
model1 = genai.GenerativeModel("gemini-2.0-flash-thinking-exp") result_matching = model1.generate_content("综合这些信息,你认为可以判断两个数据来自同一主体吗?"+json.dumps(comparison_results, ensure_ascii=False, indent=4)) print(result_matching.text) ```
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
Contact
If you have any questions or suggestions, please contact me:
- Email: u3588064@connect.hku.hk
- GitHub: [u3588064@connect.hku.hk](mailto:u3588064@connect.hku.hk)。
Wechat 