Ollama MCP Test Framework

A comprehensive testing framework for evaluating Model Context Protocol (MCP) tool-calling capabilities with Ollama language models.

Overview

This project provides tools and scripts to test how well different Ollama models can understand and use MCP (Model Context Protocol) tools. It includes a FastAPI-based MCP server, an Ollama client, and various testing scripts to evaluate model performance across different scenarios.

Features

MCP Server Implementation: FastAPI-based server providing file operations tools
Ollama Integration: Client for interacting with Ollama models and MCP servers
Multi-Model Testing: Compare tool-calling capabilities across different models
Performance Profiling: Measure response times and success rates
Docker Support: Containerized deployment options
Integration Testing: End-to-end testing workflows

Architecture

├── src/
│   ├── mcp_server.py      # Standalone MCP server (port 3002)
│   └── ollama.client.py   # Ollama client with MCP integration
├── scripts/               # Testing and utility scripts
│   ├── simple_working_tester.sh     # Interactive single-model tester
│   ├── multi_model_mcp_tester.sh    # Multi-model comparison
│   ├── model_profiler_enhanced.sh   # Performance profiling
│   └── cleanup_scripts.sh           # Script cleanup utility
├── tests/
│   └── test_mcp_integrations.py  # Integration tests
├── data/                  # Data directory for file operations
├── context_portal/        # ConPort database for project context
└── integration_test.py    # Integration testing

Quick Start

Prerequisites

Python 3.11+
Ollama installed and running
At least one Ollama model downloaded (e.g., ollama pull llama3.1)

Installation

Clone the repository:

git clone <repository-url>
cd ollama-mcp-test

Install dependencies using uv (recommended):

pip install uv
uv sync

Or using pip:

pip install -e .

Running the MCP Server

Start the standalone MCP server:

python src/mcp_server.py

The server will be available at http://localhost:3002 with the following endpoints:

POST /mcp - MCP protocol endpoint
GET /health - Health check

Basic Usage

Run a simple integration test:

python integration_test.py

Or use the interactive tester:

./scripts/simple_working_tester.sh

Available Tools

The MCP server provides three core tools:

1. read_file

Read contents of a file from the data directory.

{
  "name": "read_file",
  "arguments": {
    "path": "filename.txt"
  }
}

2. write_file

Write content to a file in the data directory.

{
  "name": "write_file", 
  "arguments": {
    "path": "filename.txt",
    "content": "Hello, World!"
  }
}

3. list_files

List files in a directory (defaults to data directory).

{
  "name": "list_files",
  "arguments": {
    "path": "."
  }
}

Testing Scripts

Simple Working Tester

Interactive script to test a single model:

./scripts/simple_working_tester.sh

Features:

Model selection menu
4 test scenarios
Real-time results
Success rate calculation

Multi-Model Comparison

Compare multiple models simultaneously:

./scripts/multi_model_mcp_tester.sh

Features:

Tests 5+ models automatically
Performance metrics (response time, accuracy)
Comparative analysis
Best performer identification

Advanced Testing

For comprehensive model profiling:

./scripts/model_profiler_enhanced.sh

Integration Testing

End-to-end integration tests:

python integration_test.py

Docker Deployment

Using Docker Compose

docker-compose up --build

Manual Docker Build

docker build -f Dockerfile.pip -t ollama-mcp-test .
docker run -p 3002:3002 ollama-mcp-test

Configuration

Environment Variables

OLLAMA_URL: Ollama server URL (default: http://localhost:11434)
MCP_PORT: MCP server port (default: 3002)

Model Configuration

Edit the model lists in testing scripts to include your preferred models:

MODELS=(
    "llama3.1:latest"
    "gemma3n:e4b" 
    "codestral:latest"
    "qwen2.5-coder:32b"
)

Test Scenarios

The framework includes several test scenarios to evaluate different aspects of MCP tool calling:

Basic File Creation: Simple tool selection and parameter extraction
Parameter Extraction: Complex content parsing and file operations
Tool Selection: Choosing the correct tool based on user intent
Context Understanding: Directory operations and file listing
Complex Parameters: Multi-parameter tool calls with structured data

Results Interpretation

Success Metrics

80%+ Success Rate: Model is suitable for production MCP workflows
50-79% Success Rate: Model may work with more explicit prompting
<50% Success Rate: Model needs significant prompt engineering

Performance Metrics

Response Time: Average time to generate tool calls
Tool Accuracy: Percentage of correct tool selections
Parameter Accuracy: Correctness of extracted parameters

Development

Project Structure

ollama-mcp-test/
├── src/
│   ├── mcp_server.py      # Standalone MCP server (port 3002)
│   └── ollama.client.py   # Ollama client with MCP integration
├── scripts/                # Testing and utility scripts
├── tests/                  # Test suite
├── data/                   # File operation workspace
├── context_portal/         # ConPort project context
├── integration_test.py    # Integration testing
├── pyproject.toml         # Project configuration
└── docker-compose.yml     # Container orchestration

Adding New Tools

Define tool schema in get_tools() function
Implement tool logic in execute_tool() function
Add test cases to testing scripts
Update documentation

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Troubleshooting

Common Issues

Ollama Connection Failed

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

MCP Server Not Responding

# Check server health
curl http://localhost:3002/health

# Restart server
python src/mcp_server.py

Model Not Available

# List available models
ollama list

# Pull required model
ollama pull llama3.1

Debug Mode

Enable verbose logging by setting environment variables:

export LOG_LEVEL=DEBUG
python src/mcp_server.py

Performance Benchmarks

Based on testing with various models:

Model	Success Rate	Avg Response Time	Notes
llama3.1:latest	85%	1200ms	Best overall performance
codestral:latest	90%	800ms	Excellent for code tasks
gemma3n:e4b	75%	1500ms	Good general purpose

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Ollama for the local LLM runtime
Model Context Protocol specification
FastAPI for the web framework
The open-source AI community

Support

For issues and questions:

Check the troubleshooting section
Review existing issues in the repository
Create a new issue with detailed information
Include logs and system information

Script Cleanup Recommendations

This project currently has many overlapping script files. Here's a consolidation plan:

Essential Scripts (Keep)

scripts/simple_working_tester.sh - Main interactive tester
scripts/multi_model_mcp_tester.sh - Multi-model comparison
scripts/model_profiler_enhanced.sh - Performance profiling
src/mcp_server.py - Standalone MCP server
integration_test.py - Python integration tests

Scripts to Remove/Consolidate

advanced_mcp_runner.sh → Merge functionality into simple_working_tester.sh
debug_profiler_flow.sh → Merge into model_profiler_enhanced.sh
dev.sh → Replace with python server_standalone.py
llm_mcp_capability_tester.sh → Duplicate of simple_working_tester.sh
model_profiler_orig.sh → Remove (superseded by enhanced version)
start_mcp_server.sh → Replace with python server_standalone.py
test_runner.sh → Use pytest directly
clean_up.sh → Add cleanup commands to main scripts

Implementation Instructions

Option 1: Automated Cleanup (Recommended)

# Run the safe cleanup script
./scripts/cleanup_scripts.sh

This script will:

Create a timestamped backup of all scripts
Verify essential scripts exist and have valid syntax
Show you exactly what will be removed
Ask for confirmation before making changes
Provide rollback instructions

Option 2: Manual Cleanup

# Create backup first
mkdir scripts_backup_$(date +%Y%m%d)
cp *.sh scripts_backup_$(date +%Y%m%d)/

# Remove duplicate/obsolete scripts
rm advanced_mcp_runner.sh debug_profiler_flow.sh dev.sh
rm llm_mcp_capability_tester.sh model_profiler_orig.sh
rm start_mcp_server.sh test_runner.sh clean_up.sh

Verification Steps

# Test essential scripts after cleanup
./scripts/simple_working_tester.sh
./scripts/multi_model_mcp_tester.sh
python src/mcp_server.py
python integration_test.py

Simplified Workflow

After cleanup, you'll have just 3 main testing scripts in scripts/:

Quick Test: ./scripts/simple_working_tester.sh
Compare Models: ./scripts/multi_model_mcp_tester.sh
Profile Performance: ./scripts/model_profiler_enhanced.sh

Plus the core Python files:

src/mcp_server.py - Start MCP server
integration_test.py - Integration testing

Happy Testing! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.roo		.roo
.vscode		.vscode
context_portal		context_portal
data		data
scripts		scripts
src		src
test_data		test_data
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CONPORT_INTEGRATION.md		CONPORT_INTEGRATION.md
Dockerfile.pip		Dockerfile.pip
Dockerfile.uv		Dockerfile.uv
README.md		README.md
docker-compose.yml		docker-compose.yml
example_mcp_usage.py		example_mcp_usage.py
main.py		main.py
mcp_client_config.json		mcp_client_config.json
pyproject.toml		pyproject.toml
test_integration.py		test_integration.py
test_mcp_client_direct.py		test_mcp_client_direct.py
test_mcp_proper.py		test_mcp_proper.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Ollama MCP Test Framework

Overview

Features

Architecture

Quick Start

Prerequisites

Installation

Running the MCP Server

Basic Usage

Available Tools

1. read_file

2. write_file

3. list_files

Testing Scripts

Simple Working Tester

Multi-Model Comparison

Advanced Testing

Integration Testing

Docker Deployment

Using Docker Compose

Manual Docker Build

Configuration

Environment Variables

Model Configuration

Test Scenarios

Results Interpretation

Success Metrics

Performance Metrics

Development

Project Structure

Adding New Tools

Contributing

Troubleshooting

Common Issues

Debug Mode

Performance Benchmarks

License

Acknowledgments

Support

Script Cleanup Recommendations

Essential Scripts (Keep)

Scripts to Remove/Consolidate

Implementation Instructions

Simplified Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages