Sponsored by Deepsite.site

IntraIntel.ai - Multi-LLM Agent Coding Challenge

Created By
Kush6146 months ago
Content

IntraIntel.ai - Multi-LLM Agent Coding Challenge

This project implements a Multi-LLM Agent system in Python designed to answer medical questions. It leverages a custom Model Context Protocol (MCP) to interact with distinct tool servers for information retrieval (Web Search and PubMed Search). The core agent orchestrates multiple free Large Language Models (LLMs) from the Hugging Face Inference API to perform tasks such as query refinement, context snippet summarization, and final answer synthesis using a Retrieval-Augmented Generation (RAG) approach. The system is designed to provide separate, synthesized answers based on the context retrieved from Web Search and PubMed, including links to the original source materials.

https://youtu.be/JvFvCYFYpzU?si=Vho8MM5Dy7YQk53E

System Architecture

image

Two llm models used distillbart and mistral instruct from free tier hugging face api

Core Approach & Design

The system follows a multi-step, agentic RAG pipeline:

  1. User Input: The agent takes a medical question from the user.
  2. (Optional) Query Refinement (LLM1): The user's question can be passed to a first LLM (e.g., mistralai/Mistral-7B-Instruct-v0.3) to generate a more concise and effective search query, focusing on key medical terms.
  3. Information Retrieval (MCP Tools):
    • The (potentially refined) search query is sent to one or both MCP tool servers:
      • Web Search MCP Server: Queries the general web using duckduckgo-search.
      • PubMed MCP Server: Queries the PubMed database using BioPython's Entrez utilities.
    • These servers return structured search results (title, snippet, URL/ID).
  4. Context Processing:
    • (Optional) Snippet Summarization (LLM2): For each search result, if the snippet is long, it can be passed to a specialized summarization LLM (e.g., sshleifer/distilbart-cnn-6-6) to create a more concise version.
    • Context Formatting: The (summarized or original) snippets from each source (Web Search, PubMed) are formatted separately into structured context strings. Each item in the context is labeled (e.g., "Item 1", "Item 2") and includes its title, content, and reference (URL or PubMed ID).
  5. Answer Synthesis (LLM3 - RAG):
    • For each information source (Web Search, PubMed), the agent makes a separate call to a powerful instruction-tuned LLM (e.g., mistralai/Mistral-7B-Instruct-v0.3).
    • The prompt to this LLM includes:
      • The original user question.
      • The formatted context from that specific source.
      • An instruction to answer based only on the provided context and to cite item numbers if possible.
    • This LLM generates a synthesized answer for that source.
  6. Output: The system presents the original question, followed by the synthesized answer derived from Web Search (with source links) and then the synthesized answer derived from PubMed (with source links).

Key Technologies:

  • Python 3.9+ with asyncio for concurrency.
  • FastAPI & Uvicorn: For building the asynchronous MCP tool servers.
  • httpx: For asynchronous HTTP requests from the agent to MCP servers and the Hugging Face API.
  • Hugging Face Inference API: To access free LLMs for refinement, summarization, and synthesis.
  • duckduckgo-search: For API-key-free web searches.
  • BioPython: For querying NCBI's PubMed database.
  • python-dotenv: For managing API keys and environment variables.

Model Context Protocol (MCP) Definition

The MCP used in this project is a simple HTTP-based JSON protocol for communication between the main agent and the tool servers:

  • Request to MCP Server:
    • Endpoint: POST /execute
    • JSON Body: {"query": "user's search query string"}
  • Response from MCP Server:
    • JSON Body (Success):
      {
        "source": "server_name_string (e.g., WebSearchServer, PubMedServer)",
        "status": "success",
        "results": [
          {
            "title": "string",
            "snippet": "string",
            "url": "string_url_if_web_search", // or missing
            "id": "string_pubmed_id_if_pubmed" // or missing
          }
          // ... more results
        ]
      }
      
    • JSON Body (Error):
      {
        "source": "server_name_string",
        "status": "error",
        "error_message": "string_describing_the_error"
      }
      

Setup Instructions

  1. Clone/Download:

    • Obtain the project files and place them in a directory named intra_intel_multi_llm_challenge.
  2. Create and Activate Python Virtual Environment:

    • Navigate to the project root directory (intra_intel_multi_llm_challenge).
    • Create the environment:
      python -m venv venv
      
    • Activate it:
      • On macOS/Linux: source venv/bin/activate
      • On Windows: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt
    
  4. Configure Environment Variables:

    • In the project root, copy the file .env.example to a new file named .env.
    • Open the .env file and fill in your details:
      # .env
      HF_API_TOKEN="your_huggingface_api_token_here"
      NCBI_EMAIL="your_email_for_pubmed@example.com"
      
      • HF_API_TOKEN: Get this from your Hugging Face account settings (https://huggingface.co/settings/tokens). A token with read permissions is sufficient.
      • NCBI_EMAIL: Provide your email address for polite programmatic access to NCBI's PubMed database via Entrez. No prior registration of this email with NCBI is needed for this project.
  5. Accept Terms for Gated Models (Crucial for Mistral):

    • The default text generation LLM used is mistralai/Mistral-7B-Instruct-v0.3. This is a gated model.
    • Log in to your Hugging Face account on their website.
    • Go to the model card page: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
    • Read and accept the license terms to gain access to this model. Your HF_API_TOKEN will then be authorized to use it. If you skip this, API calls to this model will fail.

Running the System

The system requires the MCP servers to be running before the main agent can query them.

  1. Terminal 1: Start the Web Search MCP Server:

    • Navigate to the project root (intra_intel_multi_llm_challenge).
    • Ensure your virtual environment is activated.
    • Run:
      python -m uvicorn mcp_servers.web_search_server:app --reload --port 8001
      
    • Wait for the message indicating Uvicorn is running (e.g., Uvicorn running on http://0.0.0.0:8001).
  2. Terminal 2: Start the PubMed MCP Server (for Bonus Functionality):

    • Navigate to the project root.
    • Ensure your virtual environment is activated.
    • Run:
      python -m uvicorn mcp_servers.pubmed_search_server:app --reload --port 8002
      
    • Wait for Uvicorn startup confirmation.
  3. Terminal 3: Run the Main Agent:

    • Navigate to the project root.
    • Ensure your virtual environment is activated.
    • Run:
      python agent/main_agent.py
      
    • The agent will process a predefined list of 5 medical questions. For each question, it will:
      • Optionally refine the query using LLM1.
      • Query the Web Search MCP server.
      • Optionally summarize web search snippets using LLM2.
      • Synthesize an answer based on web search context using LLM3.
      • Print the web search based answer and its source links.
      • Query the PubMed MCP server (as use_pubmed_for_this_question is True by default in examples).
      • Optionally summarize PubMed snippets using LLM2.
      • Synthesize an answer based on PubMed context using LLM3.
      • Print the PubMed based answer and its source links.
    • There will be pauses between processing each question to be considerate to the Hugging Face API rate limits.

    Note on Hugging Face API Model Loading: The first time an LLM is called via the Inference API, it might need to be loaded by Hugging Face, which can take 20-90 seconds (or more for larger models like Mistral-7B). Subsequent calls are typically faster. The script includes timeouts and attempts to handle model loading messages.

Customization and Debugging

  • LLM Choices: You can change VERIFIED_INSTRUCT_MODEL_ID and SUMMARIZATION_MODEL_ID in agent/main_agent.py if you find other models that work better or are more reliably available on the Hugging Face free inference API (always verify with the widget on the model's card page first!).
  • Toggle Features: In the if __name__ == "__main__": block of agent/main_agent.py, you can change the refine_queries and summarize_snippets boolean flags passed to main_agent_workflow to test the pipeline with these steps enabled or disabled. You can also change use_pubmed_for_this_question for individual calls.
  • Verbose Logging: To see detailed debug information (API payloads, full contexts, etc.):
    1. Open agent/main_agent.py.
    2. Change LOG_LEVEL = logging.INFO to LOG_LEVEL = logging.DEBUG.
    3. Save and re-run the agent.

Dependencies

The main dependencies are listed in requirements.txt:

  • fastapi: For building the MCP servers.
  • uvicorn[standard]: ASGI server for FastAPI.
  • httpx: For making asynchronous HTTP requests.
  • duckduckgo-search: For web search functionality.
  • biopython: For querying PubMed via NCBI Entrez.
  • python-dotenv: For managing environment variables.

This README should cover the necessary details for understanding, setting up, and running the project.

Sample medical questions output screenshot

Web search Web search output

Pubmed search PubmedSearch Output

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Playwright McpPlaywright MCP server
WindsurfThe new purpose-built IDE to harness magic
ChatWiseThe second fastest AI chatbot™
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Serper MCP ServerA Serper MCP Server
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
CursorThe AI Code Editor
DeepChatYour AI Partner on Desktop
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Amap Maps高德地图官方 MCP Server
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
Tavily Mcp
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code