A comprehensive testing framework for evaluating Model Context Protocol (MCP) tool-calling capabilities with Ollama language models.
This project provides tools and scripts to test how well different Ollama models can understand and use MCP (Model Context Protocol) tools. It includes a FastAPI-based MCP server, an Ollama client, and various testing scripts to evaluate model performance across different scenarios.
- MCP Server Implementation: FastAPI-based server providing file operations tools
- Ollama Integration: Client for interacting with Ollama models and MCP servers
- Multi-Model Testing: Compare tool-calling capabilities across different models
- Performance Profiling: Measure response times and success rates
- Docker Support: Containerized deployment options
- Integration Testing: End-to-end testing workflows
├── src/
│ ├── mcp_server.py # Standalone MCP server (port 3002)
│ └── ollama.client.py # Ollama client with MCP integration
├── scripts/ # Testing and utility scripts
│ ├── simple_working_tester.sh # Interactive single-model tester
│ ├── multi_model_mcp_tester.sh # Multi-model comparison
│ ├── model_profiler_enhanced.sh # Performance profiling
│ └── cleanup_scripts.sh # Script cleanup utility
├── tests/
│ └── test_mcp_integrations.py # Integration tests
├── data/ # Data directory for file operations
├── context_portal/ # ConPort database for project context
└── integration_test.py # Integration testing
- Python 3.11+
- Ollama installed and running
- At least one Ollama model downloaded (e.g.,
ollama pull llama3.1)
- Clone the repository:
git clone <repository-url>
cd ollama-mcp-test- Install dependencies using uv (recommended):
pip install uv
uv syncOr using pip:
pip install -e .Start the standalone MCP server:
python src/mcp_server.pyThe server will be available at http://localhost:3002 with the following endpoints:
POST /mcp- MCP protocol endpointGET /health- Health check
Run a simple integration test:
python integration_test.pyOr use the interactive tester:
./scripts/simple_working_tester.shThe MCP server provides three core tools:
Read contents of a file from the data directory.
{
"name": "read_file",
"arguments": {
"path": "filename.txt"
}
}Write content to a file in the data directory.
{
"name": "write_file",
"arguments": {
"path": "filename.txt",
"content": "Hello, World!"
}
}List files in a directory (defaults to data directory).
{
"name": "list_files",
"arguments": {
"path": "."
}
}Interactive script to test a single model:
./scripts/simple_working_tester.shFeatures:
- Model selection menu
- 4 test scenarios
- Real-time results
- Success rate calculation
Compare multiple models simultaneously:
./scripts/multi_model_mcp_tester.shFeatures:
- Tests 5+ models automatically
- Performance metrics (response time, accuracy)
- Comparative analysis
- Best performer identification
For comprehensive model profiling:
./scripts/model_profiler_enhanced.shEnd-to-end integration tests:
python integration_test.pydocker-compose up --builddocker build -f Dockerfile.pip -t ollama-mcp-test .
docker run -p 3002:3002 ollama-mcp-testOLLAMA_URL: Ollama server URL (default:http://localhost:11434)MCP_PORT: MCP server port (default:3002)
Edit the model lists in testing scripts to include your preferred models:
MODELS=(
"llama3.1:latest"
"gemma3n:e4b"
"codestral:latest"
"qwen2.5-coder:32b"
)The framework includes several test scenarios to evaluate different aspects of MCP tool calling:
- Basic File Creation: Simple tool selection and parameter extraction
- Parameter Extraction: Complex content parsing and file operations
- Tool Selection: Choosing the correct tool based on user intent
- Context Understanding: Directory operations and file listing
- Complex Parameters: Multi-parameter tool calls with structured data
- 80%+ Success Rate: Model is suitable for production MCP workflows
- 50-79% Success Rate: Model may work with more explicit prompting
- <50% Success Rate: Model needs significant prompt engineering
- Response Time: Average time to generate tool calls
- Tool Accuracy: Percentage of correct tool selections
- Parameter Accuracy: Correctness of extracted parameters
ollama-mcp-test/
├── src/
│ ├── mcp_server.py # Standalone MCP server (port 3002)
│ └── ollama.client.py # Ollama client with MCP integration
├── scripts/ # Testing and utility scripts
├── tests/ # Test suite
├── data/ # File operation workspace
├── context_portal/ # ConPort project context
├── integration_test.py # Integration testing
├── pyproject.toml # Project configuration
└── docker-compose.yml # Container orchestration
- Define tool schema in
get_tools()function - Implement tool logic in
execute_tool()function - Add test cases to testing scripts
- Update documentation
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Ollama Connection Failed
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Start Ollama if needed
ollama serveMCP Server Not Responding
# Check server health
curl http://localhost:3002/health
# Restart server
python src/mcp_server.pyModel Not Available
# List available models
ollama list
# Pull required model
ollama pull llama3.1Enable verbose logging by setting environment variables:
export LOG_LEVEL=DEBUG
python src/mcp_server.pyBased on testing with various models:
| Model | Success Rate | Avg Response Time | Notes |
|---|---|---|---|
| llama3.1:latest | 85% | 1200ms | Best overall performance |
| codestral:latest | 90% | 800ms | Excellent for code tasks |
| gemma3n:e4b | 75% | 1500ms | Good general purpose |
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama for the local LLM runtime
- Model Context Protocol specification
- FastAPI for the web framework
- The open-source AI community
For issues and questions:
- Check the troubleshooting section
- Review existing issues in the repository
- Create a new issue with detailed information
- Include logs and system information
This project currently has many overlapping script files. Here's a consolidation plan:
scripts/simple_working_tester.sh- Main interactive testerscripts/multi_model_mcp_tester.sh- Multi-model comparisonscripts/model_profiler_enhanced.sh- Performance profilingsrc/mcp_server.py- Standalone MCP serverintegration_test.py- Python integration tests
advanced_mcp_runner.sh→ Merge functionality intosimple_working_tester.shdebug_profiler_flow.sh→ Merge intomodel_profiler_enhanced.shdev.sh→ Replace withpython server_standalone.pyllm_mcp_capability_tester.sh→ Duplicate ofsimple_working_tester.shmodel_profiler_orig.sh→ Remove (superseded by enhanced version)start_mcp_server.sh→ Replace withpython server_standalone.pytest_runner.sh→ Usepytestdirectlyclean_up.sh→ Add cleanup commands to main scripts
Option 1: Automated Cleanup (Recommended)
# Run the safe cleanup script
./scripts/cleanup_scripts.shThis script will:
- Create a timestamped backup of all scripts
- Verify essential scripts exist and have valid syntax
- Show you exactly what will be removed
- Ask for confirmation before making changes
- Provide rollback instructions
Option 2: Manual Cleanup
# Create backup first
mkdir scripts_backup_$(date +%Y%m%d)
cp *.sh scripts_backup_$(date +%Y%m%d)/
# Remove duplicate/obsolete scripts
rm advanced_mcp_runner.sh debug_profiler_flow.sh dev.sh
rm llm_mcp_capability_tester.sh model_profiler_orig.sh
rm start_mcp_server.sh test_runner.sh clean_up.shVerification Steps
# Test essential scripts after cleanup
./scripts/simple_working_tester.sh
./scripts/multi_model_mcp_tester.sh
python src/mcp_server.py
python integration_test.pyAfter cleanup, you'll have just 3 main testing scripts in scripts/:
- Quick Test:
./scripts/simple_working_tester.sh - Compare Models:
./scripts/multi_model_mcp_tester.sh - Profile Performance:
./scripts/model_profiler_enhanced.sh
Plus the core Python files:
src/mcp_server.py- Start MCP serverintegration_test.py- Integration testing
Happy Testing! 🚀