Skip to content

Design weighted/semiring evaluation addon point for probabilistic logic #913

Description

@justinjoy

Summary

Wirelog currently behaves like a weighted logic engine where ordinary inserted facts have weight +1, retractions contribute -1, duplicate derivations are consolidated by signed multiplicity, and joins multiply multiplicities. In that sense, the current system already has a weight algebra: signed integer Z-set multiplicity.

Probabilistic logic can be understood as the next generalization: allow facts, derivations, and rule results to carry weights other than 1, and define how those weights combine across joins, unions, recursion, negation, aggregation, and retraction.

This issue should evaluate whether Wirelog can expose a weighted/semiring evaluation addon point so probabilistic logic can be implemented outside the core builtin function set, while preserving the core's deterministic Datalog and differential/retraction guarantees.

This is related to #912, but it is a deeper layer. #912 covers row-local scalar functions. Probabilistic logic requires changing the evaluation algebra, not just adding scalar callbacks.

Current Mental Model

The current execution model can be described as a fixed weight algebra:

  • inserted EDB fact: weight +1
  • retracted EDB fact: weight -1
  • duplicate derivations of the same tuple: weights are added during consolidation
  • join: weights are multiplied
  • antijoin/negation: handled through stratified relational semantics, not probabilistic complement
  • result existence: a tuple with non-zero signed multiplicity is present
  • incremental update: deltas propagate signed weight changes

This is effectively a signed-integer semiring/group-like model specialized for Z-set differential evaluation.

The probabilistic logic question is therefore not merely “can we call probability helper functions?” The deeper question is:

Can the weight algebra itself be made pluggable or extensible?

Motivation

For many reasoning workloads, facts are not simply true or false. They may have confidence, probability, cost, trust, provenance, risk, or score.

Examples:

edge(a, b, 0.7).
edge(b, c, 0.4).
path(x, y, p) :- edge(x, y, p).
path(x, z, p3) :- path(x, y, p1), edge(y, z, p2), combine(p1, p2, p3).

The user can encode probabilities as ordinary columns today, but then the user must manually define and maintain derivation-combination semantics. That becomes error-prone for recursion, multiple proofs, negation, aggregation, and retraction.

A weighted/semiring addon point could let Wirelog keep its core engine small while allowing higher layers to supply alternate weight semantics.

Key Distinction From Scalar Addons

Scalar addons from #912 are row-local expression functions. They are enough for probability arithmetic utilities such as:

  • prob.mul(p1, p2)
  • prob.noisy_or(p1, p2)
  • prob.threshold(p, cutoff)
  • prob.log_sum_exp(a, b)

They are not enough to define probabilistic logic semantics because they do not control:

  • how duplicate derivations combine
  • how join weights are produced
  • how recursive fixpoints converge
  • how retractions undo or recompute marginal probabilities
  • how negation behaves under uncertainty
  • how aggregation interacts with weights
  • how provenance is retained or discarded

Weighted/probabilistic logic needs an evaluation-level addon point, not just an expression-level addon point.

Candidate Abstraction: Weight Algebra / Semiring Addon

A possible addon point would describe the algebra used for tuple weights.

Potential descriptor fields:

  • ABI version
  • algebra name, e.g. zset.signed_i64, prob.independent, provenance.boolean_formula
  • weight value type and storage size
  • zero value
  • one value
  • add/combine operation for duplicate derivations or union
  • multiply/extend operation for joins
  • negate/retract operation, if supported
  • equality/zero test
  • serialization/deserialization for snapshot and delta callbacks
  • deterministic flag
  • exact vs approximate flag
  • convergence policy for recursion
  • optional provenance-retention policy

Conceptual callbacks:

typedef struct wirelog_weight_ctx wirelog_weight_ctx_t;

typedef int (*wirelog_weight_add_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a,
    const void *b,
    void *out);

typedef int (*wirelog_weight_mul_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a,
    const void *b,
    void *out);

typedef int (*wirelog_weight_retract_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a,
    void *out);

typedef bool (*wirelog_weight_is_zero_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a);

This is only a sketch. The design must account for memory ownership, inline vs heap weights, ABI stability, vectorized execution, and backend storage.

Probabilistic Algebra Examples

Independent proof probability

For independent alternative derivations of the same tuple:

combine(p1, p2) = 1 - (1 - p1) * (1 - p2)

For joins of independent events:

extend(p1, p2) = p1 * p2

This works only under independence assumptions. If derivations share facts, naive noisy-or can overcount unless provenance is tracked.

Log-space probability

Weights can be stored in log-space to reduce underflow:

extend(logp1, logp2) = logp1 + logp2
combine(logp1, logp2) = logaddexp(logp1, logp2) or log noisy-or variant

This raises representation questions because Wirelog's current hot paths are int64_t-centric.

Provenance semiring

Instead of immediately computing probabilities, weights can store symbolic provenance:

edge(a,b) weight = e1
edge(b,c) weight = e2
path(a,c) weight = e1 * e2

Marginal probabilities can be computed later from the provenance expression. This avoids some overcounting but can explode in size.

Major Semantic Questions

Before implementation, the design must answer:

  • Is the weight attached to EDB facts, IDB facts, derivations, rules, or all of them?
  • Are rule weights supported, or only fact weights?
  • How are multiple proofs combined?
  • Are derivations assumed independent?
  • Is provenance tracked to avoid double counting shared evidence?
  • How does recursion converge?
  • Are probabilities exact, approximate, sampled, bounded, or top-k?
  • How is negation interpreted?
    • closed-world absence?
    • probabilistic complement 1-p?
    • stratified deterministic negation only?
  • How do aggregates behave?
  • What is the representation of probability?
    • fixed-scale integer
    • rational
    • double
    • log-space fixed-point
    • interval
    • symbolic provenance handle
  • How do insertions and retractions update derived probabilities?
  • Does delta propagation carry weight deltas or full replacement weights?
  • Can the algebra support negative weights, or is retraction a separate operation?

Interaction With Current Wirelog Architecture

This is a deep architectural change because current execution assumes signed multiplicity in many places.

Likely affected areas:

  • relation storage and row representation
  • delta tuple representation
  • consolidation and duplicate elimination
  • join weight multiplication
  • recursive fixpoint iteration
  • differential arrangements
  • retraction propagation
  • aggregation semantics
  • snapshot and delta callback ABI
  • optimizer assumptions around equivalence and rewrites
  • tests for semantics, determinism, and incremental correctness

A weighted addon point must not silently reuse signed-multiplicity assumptions where the selected algebra has different laws.

Architecture Options

Option 1: Keep weights as ordinary columns

No new evaluation addon point. Users encode probabilities manually and use #912 scalar functions for arithmetic.

Pros:

Cons:

  • Multiple derivations are not combined automatically.
  • Recursive probabilistic logic is easy to get wrong.
  • Retraction semantics remain user-managed.
  • No optimizer/runtime awareness.

Option 2: Add a weight algebra addon point

Expose a pluggable algebra for tuple weights while keeping relational structure in core.

Pros:

  • Aligns with the idea that current +1/-1 multiplicity is one weight algebra.
  • Could support probabilistic, confidence, cost, trust, and provenance models.
  • Avoids hardcoding one probabilistic semantics into core.

Cons:

  • Large ABI and implementation surface.
  • Hard interaction with differential/retraction semantics.
  • Needs strict algebraic law requirements for optimizer correctness.
  • May require separate storage/evaluation paths for non-int64_t weights.

Option 3: Add a separate weighted backend/evaluation mode

Keep the existing Z-set engine unchanged. Add a new backend or evaluation mode that explicitly supports weighted/semiring logic.

Pros:

  • Avoids destabilizing existing signed-multiplicity behavior.
  • Lets weighted semantics have different storage and fixpoint rules.
  • Cleaner boundary than trying to make every hot path generic.

Cons:

  • More implementation work.
  • Backend abstraction may need to grow.
  • Feature parity with the columnar backend is not automatic.

Option 4: Keep probabilistic inference outside Wirelog

Use Wirelog for deterministic candidate/proof generation and run probabilistic inference in a higher-layer engine.

Pros:

  • No core risk.
  • Allows specialized inference algorithms.
  • Good for complex models such as ProbLog/Markov logic.

Cons:

  • Less integrated.
  • Incremental feedback loops are application-managed.
  • Probabilistic recursion is not native.

Recommended Direction

Do not treat probabilistic logic as a scalar addon problem.

Short term:

Medium term:

  • Investigate Option 2 and Option 3 as separate design tracks.
  • Model the current signed multiplicity behavior explicitly as the baseline algebra.
  • Define the minimal algebraic laws required by joins, consolidation, recursion, retraction, and optimizer rewrites.
  • Decide whether a generic addon ABI is feasible or whether a separate weighted backend is safer.

Long term:

  • Consider a reference probabilistic/provenance implementation only after the algebra boundary is validated with tests.

Possible Minimal Prototype

A prototype should not start with full probabilistic Datalog syntax. Instead:

  1. Make the current signed multiplicity algebra explicit in internal documentation.
  2. Add test-only hooks or an experimental build option for alternate weight combination.
  3. Prototype one simple fixed-scale probability algebra outside public ABI.
  4. Run small examples:
    • independent path probability
    • multiple proof combination
    • recursive reachability with bounded convergence
    • insertion/retraction updates
  5. Compare results against an external reference implementation.

Only after this should a public ABI be considered.

Test Requirements

Any weighted/probabilistic evaluation design needs tests for:

  • duplicate derivation combination
  • join weight extension
  • recursive fixpoint convergence
  • retraction and replacement updates
  • multiple proof overcounting
  • shared evidence/provenance cases
  • negation policy
  • aggregate interaction
  • deterministic replay
  • snapshot and delta callback representation
  • optimizer equivalence preservation or optimizer disabling under unsupported algebra

Acceptance Criteria

  • The issue frames probabilistic logic as a weight-algebra/semiring evaluation problem, not merely scalar functions.
  • The current +1/-1 signed multiplicity model is documented as the baseline weight algebra.
  • Design scalar function addon point for domain-specific operations #912 is identified as sufficient only for probability utility functions, not probabilistic logic semantics.
  • The design identifies whether the next step is a generic weight algebra addon point, a separate weighted backend, or an external higher-layer inference engine.
  • Any proposed public ABI defines weight representation, add/combine, multiply/extend, zero/one, retraction, equality/zero-test, serialization, and lifecycle rules.
  • Any probabilistic semantics proposal defines multiple derivation combination, independence/provenance assumptions, recursion, negation, aggregation, and retraction behavior before implementation.
  • Core builtin probability semantics are not added until these questions are resolved.

Follow-Up Issues

Potential follow-ups:

  • Document current signed multiplicity/Z-set algebra as Wirelog's baseline weight model.
  • Research semiring/provenance requirements for weighted Datalog in Wirelog.
  • Prototype an experimental weighted backend or alternate algebra path.
  • Add probability utility functions as examples for the Design scalar function addon point for domain-specific operations #912 scalar addon point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions