LOGIC utilizes log probability distributions as a unique "fingerprint" of a model's response. By comparing the log probability distributions of a response from a claimed model with a response from a known model, we can determine if the responses are from the same model.
- Sampling: The tool randomly samples N positions from the response
- Re-querying: For each position, it reconstructs the context and queries the verification model
- Comparison: Compares the original log probabilities with fresh ones using either token IDs or text matching
- Statistical Test: Runs a Kolmogorov-Smirnov test to determine if distributions match
- Verdict: Produces a probability score indicating if models are the same
The verifier supports two matching strategies:
The standard approach uses token IDs to align tokens between the original and verification responses:
- Most accurate for providers that return token IDs (e.g., vLLM with
return_tokens_as_token_ids) - Ensures exact token-level correspondence
- Recommended when both sample and verification sources support token IDs
With vLLM, we can request token IDs with the return_tokens_as_token_ids parameter. The OpenAI API, however, does not support this parameter.
For providers that don't return token IDs (e.g., OpenRouter, OpenAI, etc.), the system can fallback to text-based matching. In this approach, we will reconstruct the context up to the position of a given token and then query the verification model with that specific context. We will then compare the original log probabilities with the fresh ones using direct text matching.
We can use the --text-only-matching flag:
uv run logprob-sample \
--endpoint https://openrouter.ai/api/v1 \
--model meta-llama/llama-3.1-8b-instruct \
--api-key $OPENROUTER_API_KEY \
--skip-token-ids # Don't request token IDs
uv run logprob-verify \
-f verification_data.json \
--verifier-endpoint http://localhost:8000/v1 \
--verifier-model meta-llama/Llama-3.1-8B-Instruct \
--text-only-matching # Enable text matching modeText Matching Features:
-
Unicode Normalization (src/core.py:102-104): Applies NFC normalization to handle composed vs. decomposed characters (e.g., "é" vs "e + accent")
-
Tokenizer Marker Handling (src/core.py:109-115): Normalizes common tokenizer markers:
- SentencePiece:
▁→ space - GPT-2/GPT-3:
Ġ→ space,Ċ→ newline - BERT:
##→ (removed) - BPE:
@@,</w>→ (removed)
- SentencePiece:
-
Soft Matching (src/core.py:132-190): Flexible token alignment supporting:
- Exact matches
- Whitespace-only equivalence (any whitespace matches any whitespace)
- Punctuation-only equivalence
- Stripped equivalence (tokens that differ only by surrounding whitespace)
- Optional prefix matching (disabled in strict mode via
--strict-text-matching)
-
Alias Keys (src/core.py:192-241): Multiple lookup keys per token for fuzzy matching:
text::- Exact cleaned tokenstrip::- Leading/trailing whitespace removedlower::- Lowercase for case-insensitive matchingcompact::- All spaces removedwhitespace::- Exact whitespace pattern preservationpunct::- Punctuation pattern matching
-
Character Span Alignment (src/core.py:289-397): Maps tokens to response text positions, handling:
- Duplicate tokens (common API bug with some providers)
- Misaligned token boundaries
- Missing tokens with fallback estimation
-
Strict Mode (src/core.py:40, 52, 167-168): Control matching strictness:
- Enabled by default (
--strict-text-matching) - Disables lenient prefix matching with punctuation remainders
- Recommended for higher confidence verification
- Enabled by default (
When to Use Text Matching:
- ✅ OpenRouter and other providers without token ID support
- ✅ Cross-provider verification (e.g., OpenRouter sample vs. local vLLM verification)
- ✅ Older API versions that don't expose token IDs
⚠️ Slightly lower accuracy than token ID matching due to tokenization differences⚠️ May produce more "uncertain" results requiring higher--n-samples
The Kolmogorov-Smirnov (KS) test is a non-parametric statistical test that compares two distributions:
- High p-value (>0.5): Distributions are similar → same model
- Low p-value (<0.05): Distributions are different → different models
- Correlation: Measures how similarly the models rank token probabilities
Use more samples for increased confidence:
uv run logprob-verify \
-f verification_data.json \
--verifier-endpoint http://localhost:8000/v1 \
--verifier-model Qwen/Qwen2-1.5B-Instruct \
--n-samples 40 # Default is 20This is the main metric to look at:
| Probability | Verdict | Interpretation |
|---|---|---|
| > 0.9 | PASS | Strong evidence of same model |
| 0.7 - 0.9 | LIKELY PASS | Probably the same model |
| 0.3 - 0.7 | UNCERTAIN | Increase --n-samples for clarity |
| 0.1 - 0.3 | LIKELY FAIL | Probably different models |
| < 0.1 | FAIL | Strong evidence of different models |
- KS Statistic: Distance between distributions (0 = identical, 1 = completely different)
- p-value: Probability distributions are from the same source
- Correlation: How similarly models rank tokens (1 = perfect agreement)
- Mean Difference: Average difference in log probabilities
sequenceDiagram
participant Client
participant Worker as Worker<br/>(Claimed Model)
participant Verifier as Verifier<br/>(Known vLLM Instance)
participant Analyzer as Statistical<br/>Analyzer
Note over Client,Analyzer: Phase 1: Initial Sampling
Client->>Worker: Generate response with logprobs
Worker-->>Client: Response + log probs + token IDs
Note over Client,Analyzer: Phase 2: Verification Sampling
Client->>Client: Randomly sample N positions<br/>from response
loop For each sampled position
Client->>Client: Reconstruct context up to position
Client->>Verifier: Query with context<br/>(request logprobs)
Verifier-->>Client: Fresh log probs + token IDs<br/>for next token
end
Note over Client,Analyzer: Phase 3: Statistical Analysis
Client->>Analyzer: Compare distributions:<br/>• Original log probs<br/>• Verification log probs
Analyzer->>Analyzer: Compute metrics:<br/>• KS statistic<br/>• p-value<br/>• Correlation<br/>• Mean difference
Analyzer->>Analyzer: Kolmogorov-Smirnov test:<br/>Are distributions from<br/>same source?
Analyzer-->>Client: Verdict + confidence score
alt High p-value (>0.9)
Note over Client: ✅ PASS: Same model
else Low p-value (<0.1)
Note over Client: ❌ FAIL: Different models<br/>(Potential spoofing detected)
else Uncertain (0.3-0.7)
Note over Client: ⚠️ UNCERTAIN: Increase samples
end