- Mcp Langchain Server
Mcp Langchain Server
MCP-style LangChain + Ollama Python project that:
Runs completely free and locally
Accepts MCP-style { action, params } via a Flask API
Routes to a local LLM (via Ollama) for answers
File:
app.py ← Flask API (MCP-style server)
│
├── app.py ← Flask API (MCP-style server)-> Flask web server
├── mcp_agent.py ← LangChain logic: interprets actions -> LangChain agent
├── requirements.txt ← Python dependencies
└── README.md ← Project overview
MCP-based LangChain server
The LangChain agent received the correct action
It executed it using your logic (from mcp_agent.py)
It responded with the correct time back to Postman
Next Steps: Test with other action values
Add new functions to mcp_agent.py
Or integrate this with a frontend
adding more intelligent tasks or connecting it with a UI!
##Llama models Explore https://ollama.com/library
##Run gemma:2b
ollama run gemma:2b
gemma:2b uses just ~2–3 GB RAM.
##Run mistral
ollama run mistral
mistral only needs ~4 GB of RAM and still performs quite well for general tasks.
##Run llama3
ollama run llama3
Ollama runs models like llama3 entirely on your machine, in memory.
Even though it's optimized, LLaMA 3 still needs at least ~6 GB of RAM free, and ideally more (8–12 GB total system RAM recommended).
##List downloaded models: ollama list
##Pull other models: ollama pull mistral
ollama pull codellama
Stop running model:
ollama stop
Start your MCP server:
python app.py
You need to install the latest LangChain with community modules, specifically:
pip install langchain-community
##Recommended Full Setup (for safety):
If you're using LangChain with Ollama, it's best to ensure these are installed:
pip install langchain langchain-community langchain-core langchainhub
pip install ollama
gemma:2b ,llama3 has cut off date
Static Knowledge: These models are trained on data available up to a certain point (e.g., mid-2023 for many models).
They don’t know anything that happened after their training cut-off date.
Want Real News in a LangChain/Ollama Setup?
You can combine the local model with:
RAG (Retrieval-Augmented Generation), where you fetch live news via API (like NewsAPI or Google News),
Then pass that info as context to the LLM to summarize or answer based on it.
LangChain + News API integration using local Ollama model.