Skip to content

Latest commit

 

History

History
86 lines (67 loc) · 2.52 KB

File metadata and controls

86 lines (67 loc) · 2.52 KB

LOGIC Testing

The repo supports two modes of testing:

  • Local testing using vLLM
  • Provider (OpenRouter) testing

Local Testing with vLLM

# Start server with a small model
uv run python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2-1.5B-Instruct \
  --port 8000

Successful Verification

# Generate verification data
uv run logprob-sample \
  --endpoint http://localhost:8000/v1 \
  --model Qwen/Qwen2-1.5B-Instruct \
  --prompt "Explain quantum computing in simple terms"

# Verify the response (should pass with same model)
uv run logprob-verify \
  -f .debug/verification_data_*.json \
  --verifier-endpoint http://localhost:8000/v1 \
  --verifier-model Qwen/Qwen2-1.5B-Instruct

Failed Verification

# Generate data with one model
uv run logprob-sample \
  --endpoint http://localhost:8000/v1 \
  --model Qwen/Qwen2-1.5B-Instruct \
  --prompt "Explain quantum computing"

# Switch to different model
uv run python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --port 8000

# Verify with different model (should fail)
uv run logprob-verify \
  -f .debug/verification_data_*.json \
  --verifier-endpoint http://localhost:8000/v1 \
  --verifier-model meta-llama/Llama-3.1-8B-Instruct

Provider (OpenRouter) Testing

Test verification with OpenRouter as the sample source and local vLLM as the verifier:

# Should pass - same model
uv run python tests/providers/test_openrouter.py \
  --openrouter-model "meta-llama/llama-3.1-8b-instruct" \
  --local-model "meta-llama/Llama-3.1-8B-Instruct" \
  --test-sample-size 15 \
  --provider "Fireworks"

# Should fail - different models
uv run python tests/providers/test_openrouter.py \
  --openrouter-model "meta-llama/llama-3.1-8b-instruct" \
  --local-model "mistralai/Mistral-7B-Instruct-v0.3" \
  --test-sample-size 15 \
  --provider "InferenceNet"

This test:

  1. Generates verification data using OpenRouter (without token IDs)
  2. Starts a local vLLM server with the specified model
  3. Verifies using text-only matching mode
  4. Reports pass/fail based on same_model_probability threshold

Troubleshooting:

  • Port in use: pkill -f vllm.entrypoints.openai.api_server
  • Server timeout: Try smaller models or check GPU memory, ensure you have access to download the model from Hugging Face
  • Uncertain results: Use more samples for evaluation (increase --n-samples or --test-sample-size)
  • Text matching issues: Enable strict mode with --strict-text-matching for higher confidence