Skip to content

Latest commit

 

History

History
173 lines (124 loc) · 7.19 KB

File metadata and controls

173 lines (124 loc) · 7.19 KB

LOGIC Methodology

LOGIC utilizes log probability distributions as a unique "fingerprint" of a model's response. By comparing the log probability distributions of a response from a claimed model with a response from a known model, we can determine if the responses are from the same model.

How It Works

  1. Sampling: The tool randomly samples N positions from the response
  2. Re-querying: For each position, it reconstructs the context and queries the verification model
  3. Comparison: Compares the original log probabilities with fresh ones using either token IDs or text matching
  4. Statistical Test: Runs a Kolmogorov-Smirnov test to determine if distributions match
  5. Verdict: Produces a probability score indicating if models are the same

Token Matching Modes

The verifier supports two matching strategies:

Token ID Matching (Primary Method)

The standard approach uses token IDs to align tokens between the original and verification responses:

  • Most accurate for providers that return token IDs (e.g., vLLM with return_tokens_as_token_ids)
  • Ensures exact token-level correspondence
  • Recommended when both sample and verification sources support token IDs

With vLLM, we can request token IDs with the return_tokens_as_token_ids parameter. The OpenAI API, however, does not support this parameter.

Text-Based Matching (Fallback)

For providers that don't return token IDs (e.g., OpenRouter, OpenAI, etc.), the system can fallback to text-based matching. In this approach, we will reconstruct the context up to the position of a given token and then query the verification model with that specific context. We will then compare the original log probabilities with the fresh ones using direct text matching.

We can use the --text-only-matching flag:

uv run logprob-sample \
  --endpoint https://openrouter.ai/api/v1 \
  --model meta-llama/llama-3.1-8b-instruct \
  --api-key $OPENROUTER_API_KEY \
  --skip-token-ids  # Don't request token IDs

uv run logprob-verify \
  -f verification_data.json \
  --verifier-endpoint http://localhost:8000/v1 \
  --verifier-model meta-llama/Llama-3.1-8B-Instruct \
  --text-only-matching  # Enable text matching mode

Text Matching Features:

  1. Unicode Normalization (src/core.py:102-104): Applies NFC normalization to handle composed vs. decomposed characters (e.g., "é" vs "e + accent")

  2. Tokenizer Marker Handling (src/core.py:109-115): Normalizes common tokenizer markers:

    • SentencePiece: → space
    • GPT-2/GPT-3: Ġ → space, Ċ → newline
    • BERT: ## → (removed)
    • BPE: @@, </w> → (removed)
  3. Soft Matching (src/core.py:132-190): Flexible token alignment supporting:

    • Exact matches
    • Whitespace-only equivalence (any whitespace matches any whitespace)
    • Punctuation-only equivalence
    • Stripped equivalence (tokens that differ only by surrounding whitespace)
    • Optional prefix matching (disabled in strict mode via --strict-text-matching)
  4. Alias Keys (src/core.py:192-241): Multiple lookup keys per token for fuzzy matching:

    • text:: - Exact cleaned token
    • strip:: - Leading/trailing whitespace removed
    • lower:: - Lowercase for case-insensitive matching
    • compact:: - All spaces removed
    • whitespace:: - Exact whitespace pattern preservation
    • punct:: - Punctuation pattern matching
  5. Character Span Alignment (src/core.py:289-397): Maps tokens to response text positions, handling:

    • Duplicate tokens (common API bug with some providers)
    • Misaligned token boundaries
    • Missing tokens with fallback estimation
  6. Strict Mode (src/core.py:40, 52, 167-168): Control matching strictness:

    • Enabled by default (--strict-text-matching)
    • Disables lenient prefix matching with punctuation remainders
    • Recommended for higher confidence verification

When to Use Text Matching:

  • ✅ OpenRouter and other providers without token ID support
  • ✅ Cross-provider verification (e.g., OpenRouter sample vs. local vLLM verification)
  • ✅ Older API versions that don't expose token IDs
  • ⚠️ Slightly lower accuracy than token ID matching due to tokenization differences
  • ⚠️ May produce more "uncertain" results requiring higher --n-samples

The KS Test

The Kolmogorov-Smirnov (KS) test is a non-parametric statistical test that compares two distributions:

  • High p-value (>0.5): Distributions are similar → same model
  • Low p-value (<0.05): Distributions are different → different models
  • Correlation: Measures how similarly the models rank token probabilities

Higher Confidence Verification

Use more samples for increased confidence:

uv run logprob-verify \
  -f verification_data.json \
  --verifier-endpoint http://localhost:8000/v1 \
  --verifier-model Qwen/Qwen2-1.5B-Instruct \
  --n-samples 40  # Default is 20

Interpreting Results

Same Model Probability

This is the main metric to look at:

Probability Verdict Interpretation
> 0.9 PASS Strong evidence of same model
0.7 - 0.9 LIKELY PASS Probably the same model
0.3 - 0.7 UNCERTAIN Increase --n-samples for clarity
0.1 - 0.3 LIKELY FAIL Probably different models
< 0.1 FAIL Strong evidence of different models

Other Metrics

  • KS Statistic: Distance between distributions (0 = identical, 1 = completely different)
  • p-value: Probability distributions are from the same source
  • Correlation: How similarly models rank tokens (1 = perfect agreement)
  • Mean Difference: Average difference in log probabilities

LOGIC Sequence Diagram

sequenceDiagram
    participant Client
    participant Worker as Worker<br/>(Claimed Model)
    participant Verifier as Verifier<br/>(Known vLLM Instance)
    participant Analyzer as Statistical<br/>Analyzer

    Note over Client,Analyzer: Phase 1: Initial Sampling
    Client->>Worker: Generate response with logprobs
    Worker-->>Client: Response + log probs + token IDs

    Note over Client,Analyzer: Phase 2: Verification Sampling
    Client->>Client: Randomly sample N positions<br/>from response

    loop For each sampled position
        Client->>Client: Reconstruct context up to position
        Client->>Verifier: Query with context<br/>(request logprobs)
        Verifier-->>Client: Fresh log probs + token IDs<br/>for next token
    end

    Note over Client,Analyzer: Phase 3: Statistical Analysis
    Client->>Analyzer: Compare distributions:<br/>• Original log probs<br/>• Verification log probs

    Analyzer->>Analyzer: Compute metrics:<br/>• KS statistic<br/>• p-value<br/>• Correlation<br/>• Mean difference

    Analyzer->>Analyzer: Kolmogorov-Smirnov test:<br/>Are distributions from<br/>same source?

    Analyzer-->>Client: Verdict + confidence score

    alt High p-value (>0.9)
        Note over Client: ✅ PASS: Same model
    else Low p-value (<0.1)
        Note over Client: ❌ FAIL: Different models<br/>(Potential spoofing detected)
    else Uncertain (0.3-0.7)
        Note over Client: ⚠️ UNCERTAIN: Increase samples
    end
Loading