LOGIC Testing

The repo supports two modes of testing:

Local testing using vLLM
Provider (OpenRouter) testing

Local Testing with vLLM

# Start server with a small model
uv run python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2-1.5B-Instruct \
  --port 8000

Successful Verification

# Generate verification data
uv run logprob-sample \
  --endpoint http://localhost:8000/v1 \
  --model Qwen/Qwen2-1.5B-Instruct \
  --prompt "Explain quantum computing in simple terms"

# Verify the response (should pass with same model)
uv run logprob-verify \
  -f .debug/verification_data_*.json \
  --verifier-endpoint http://localhost:8000/v1 \
  --verifier-model Qwen/Qwen2-1.5B-Instruct

Failed Verification

# Generate data with one model
uv run logprob-sample \
  --endpoint http://localhost:8000/v1 \
  --model Qwen/Qwen2-1.5B-Instruct \
  --prompt "Explain quantum computing"

# Switch to different model
uv run python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --port 8000

# Verify with different model (should fail)
uv run logprob-verify \
  -f .debug/verification_data_*.json \
  --verifier-endpoint http://localhost:8000/v1 \
  --verifier-model meta-llama/Llama-3.1-8B-Instruct

Provider (OpenRouter) Testing

Test verification with OpenRouter as the sample source and local vLLM as the verifier:

# Should pass - same model
uv run python tests/providers/test_openrouter.py \
  --openrouter-model "meta-llama/llama-3.1-8b-instruct" \
  --local-model "meta-llama/Llama-3.1-8B-Instruct" \
  --test-sample-size 15 \
  --provider "Fireworks"

# Should fail - different models
uv run python tests/providers/test_openrouter.py \
  --openrouter-model "meta-llama/llama-3.1-8b-instruct" \
  --local-model "mistralai/Mistral-7B-Instruct-v0.3" \
  --test-sample-size 15 \
  --provider "InferenceNet"

This test:

Generates verification data using OpenRouter (without token IDs)
Starts a local vLLM server with the specified model
Verifies using text-only matching mode
Reports pass/fail based on same_model_probability threshold

Troubleshooting:

Port in use: pkill -f vllm.entrypoints.openai.api_server
Server timeout: Try smaller models or check GPU memory, ensure you have access to download the model from Hugging Face
Uncertain results: Use more samples for evaluation (increase --n-samples or --test-sample-size)
Text matching issues: Enable strict mode with --strict-text-matching for higher confidence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOGIC Testing

Local Testing with vLLM

Provider (OpenRouter) Testing

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

LOGIC Testing

Local Testing with vLLM

Provider (OpenRouter) Testing