Skip to content

pwpeterson/ollama-mcp-test

Repository files navigation

Ollama MCP Test Framework

A comprehensive testing framework for evaluating Model Context Protocol (MCP) tool-calling capabilities with Ollama language models.

Overview

This project provides tools and scripts to test how well different Ollama models can understand and use MCP (Model Context Protocol) tools. It includes a FastAPI-based MCP server, an Ollama client, and various testing scripts to evaluate model performance across different scenarios.

Features

  • MCP Server Implementation: FastAPI-based server providing file operations tools
  • Ollama Integration: Client for interacting with Ollama models and MCP servers
  • Multi-Model Testing: Compare tool-calling capabilities across different models
  • Performance Profiling: Measure response times and success rates
  • Docker Support: Containerized deployment options
  • Integration Testing: End-to-end testing workflows

Architecture

├── src/
│   ├── mcp_server.py      # Standalone MCP server (port 3002)
│   └── ollama.client.py   # Ollama client with MCP integration
├── scripts/               # Testing and utility scripts
│   ├── simple_working_tester.sh     # Interactive single-model tester
│   ├── multi_model_mcp_tester.sh    # Multi-model comparison
│   ├── model_profiler_enhanced.sh   # Performance profiling
│   └── cleanup_scripts.sh           # Script cleanup utility
├── tests/
│   └── test_mcp_integrations.py  # Integration tests
├── data/                  # Data directory for file operations
├── context_portal/        # ConPort database for project context
└── integration_test.py    # Integration testing

Quick Start

Prerequisites

  • Python 3.11+
  • Ollama installed and running
  • At least one Ollama model downloaded (e.g., ollama pull llama3.1)

Installation

  1. Clone the repository:
git clone <repository-url>
cd ollama-mcp-test
  1. Install dependencies using uv (recommended):
pip install uv
uv sync

Or using pip:

pip install -e .

Running the MCP Server

Start the standalone MCP server:

python src/mcp_server.py

The server will be available at http://localhost:3002 with the following endpoints:

  • POST /mcp - MCP protocol endpoint
  • GET /health - Health check

Basic Usage

Run a simple integration test:

python integration_test.py

Or use the interactive tester:

./scripts/simple_working_tester.sh

Available Tools

The MCP server provides three core tools:

1. read_file

Read contents of a file from the data directory.

{
  "name": "read_file",
  "arguments": {
    "path": "filename.txt"
  }
}

2. write_file

Write content to a file in the data directory.

{
  "name": "write_file", 
  "arguments": {
    "path": "filename.txt",
    "content": "Hello, World!"
  }
}

3. list_files

List files in a directory (defaults to data directory).

{
  "name": "list_files",
  "arguments": {
    "path": "."
  }
}

Testing Scripts

Simple Working Tester

Interactive script to test a single model:

./scripts/simple_working_tester.sh

Features:

  • Model selection menu
  • 4 test scenarios
  • Real-time results
  • Success rate calculation

Multi-Model Comparison

Compare multiple models simultaneously:

./scripts/multi_model_mcp_tester.sh

Features:

  • Tests 5+ models automatically
  • Performance metrics (response time, accuracy)
  • Comparative analysis
  • Best performer identification

Advanced Testing

For comprehensive model profiling:

./scripts/model_profiler_enhanced.sh

Integration Testing

End-to-end integration tests:

python integration_test.py

Docker Deployment

Using Docker Compose

docker-compose up --build

Manual Docker Build

docker build -f Dockerfile.pip -t ollama-mcp-test .
docker run -p 3002:3002 ollama-mcp-test

Configuration

Environment Variables

  • OLLAMA_URL: Ollama server URL (default: http://localhost:11434)
  • MCP_PORT: MCP server port (default: 3002)

Model Configuration

Edit the model lists in testing scripts to include your preferred models:

MODELS=(
    "llama3.1:latest"
    "gemma3n:e4b" 
    "codestral:latest"
    "qwen2.5-coder:32b"
)

Test Scenarios

The framework includes several test scenarios to evaluate different aspects of MCP tool calling:

  1. Basic File Creation: Simple tool selection and parameter extraction
  2. Parameter Extraction: Complex content parsing and file operations
  3. Tool Selection: Choosing the correct tool based on user intent
  4. Context Understanding: Directory operations and file listing
  5. Complex Parameters: Multi-parameter tool calls with structured data

Results Interpretation

Success Metrics

  • 80%+ Success Rate: Model is suitable for production MCP workflows
  • 50-79% Success Rate: Model may work with more explicit prompting
  • <50% Success Rate: Model needs significant prompt engineering

Performance Metrics

  • Response Time: Average time to generate tool calls
  • Tool Accuracy: Percentage of correct tool selections
  • Parameter Accuracy: Correctness of extracted parameters

Development

Project Structure

ollama-mcp-test/
├── src/
│   ├── mcp_server.py      # Standalone MCP server (port 3002)
│   └── ollama.client.py   # Ollama client with MCP integration
├── scripts/                # Testing and utility scripts
├── tests/                  # Test suite
├── data/                   # File operation workspace
├── context_portal/         # ConPort project context
├── integration_test.py    # Integration testing
├── pyproject.toml         # Project configuration
└── docker-compose.yml     # Container orchestration

Adding New Tools

  1. Define tool schema in get_tools() function
  2. Implement tool logic in execute_tool() function
  3. Add test cases to testing scripts
  4. Update documentation

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Troubleshooting

Common Issues

Ollama Connection Failed

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

MCP Server Not Responding

# Check server health
curl http://localhost:3002/health

# Restart server
python src/mcp_server.py

Model Not Available

# List available models
ollama list

# Pull required model
ollama pull llama3.1

Debug Mode

Enable verbose logging by setting environment variables:

export LOG_LEVEL=DEBUG
python src/mcp_server.py

Performance Benchmarks

Based on testing with various models:

Model Success Rate Avg Response Time Notes
llama3.1:latest 85% 1200ms Best overall performance
codestral:latest 90% 800ms Excellent for code tasks
gemma3n:e4b 75% 1500ms Good general purpose

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review existing issues in the repository
  3. Create a new issue with detailed information
  4. Include logs and system information

Script Cleanup Recommendations

This project currently has many overlapping script files. Here's a consolidation plan:

Essential Scripts (Keep)

  • scripts/simple_working_tester.sh - Main interactive tester
  • scripts/multi_model_mcp_tester.sh - Multi-model comparison
  • scripts/model_profiler_enhanced.sh - Performance profiling
  • src/mcp_server.py - Standalone MCP server
  • integration_test.py - Python integration tests

Scripts to Remove/Consolidate

  • advanced_mcp_runner.sh → Merge functionality into simple_working_tester.sh
  • debug_profiler_flow.sh → Merge into model_profiler_enhanced.sh
  • dev.sh → Replace with python server_standalone.py
  • llm_mcp_capability_tester.sh → Duplicate of simple_working_tester.sh
  • model_profiler_orig.sh → Remove (superseded by enhanced version)
  • start_mcp_server.sh → Replace with python server_standalone.py
  • test_runner.sh → Use pytest directly
  • clean_up.sh → Add cleanup commands to main scripts

Implementation Instructions

Option 1: Automated Cleanup (Recommended)

# Run the safe cleanup script
./scripts/cleanup_scripts.sh

This script will:

  • Create a timestamped backup of all scripts
  • Verify essential scripts exist and have valid syntax
  • Show you exactly what will be removed
  • Ask for confirmation before making changes
  • Provide rollback instructions

Option 2: Manual Cleanup

# Create backup first
mkdir scripts_backup_$(date +%Y%m%d)
cp *.sh scripts_backup_$(date +%Y%m%d)/

# Remove duplicate/obsolete scripts
rm advanced_mcp_runner.sh debug_profiler_flow.sh dev.sh
rm llm_mcp_capability_tester.sh model_profiler_orig.sh
rm start_mcp_server.sh test_runner.sh clean_up.sh

Verification Steps

# Test essential scripts after cleanup
./scripts/simple_working_tester.sh
./scripts/multi_model_mcp_tester.sh
python src/mcp_server.py
python integration_test.py

Simplified Workflow

After cleanup, you'll have just 3 main testing scripts in scripts/:

  1. Quick Test: ./scripts/simple_working_tester.sh
  2. Compare Models: ./scripts/multi_model_mcp_tester.sh
  3. Profile Performance: ./scripts/model_profiler_enhanced.sh

Plus the core Python files:

  • src/mcp_server.py - Start MCP server
  • integration_test.py - Integration testing

Happy Testing! 🚀

About

ollama-mcp-test container and model profiler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors