Design weighted/semiring evaluation addon point for probabilistic logic

## Summary

Wirelog currently behaves like a weighted logic engine where ordinary inserted facts have weight `+1`, retractions contribute `-1`, duplicate derivations are consolidated by signed multiplicity, and joins multiply multiplicities. In that sense, the current system already has a weight algebra: signed integer Z-set multiplicity.

Probabilistic logic can be understood as the next generalization: allow facts, derivations, and rule results to carry weights other than `1`, and define how those weights combine across joins, unions, recursion, negation, aggregation, and retraction.

This issue should evaluate whether Wirelog can expose a weighted/semiring evaluation addon point so probabilistic logic can be implemented outside the core builtin function set, while preserving the core's deterministic Datalog and differential/retraction guarantees.

This is related to #912, but it is a deeper layer. #912 covers row-local scalar functions. Probabilistic logic requires changing the evaluation algebra, not just adding scalar callbacks.

## Current Mental Model

The current execution model can be described as a fixed weight algebra:

- inserted EDB fact: weight `+1`
- retracted EDB fact: weight `-1`
- duplicate derivations of the same tuple: weights are added during consolidation
- join: weights are multiplied
- antijoin/negation: handled through stratified relational semantics, not probabilistic complement
- result existence: a tuple with non-zero signed multiplicity is present
- incremental update: deltas propagate signed weight changes

This is effectively a signed-integer semiring/group-like model specialized for Z-set differential evaluation.

The probabilistic logic question is therefore not merely “can we call probability helper functions?” The deeper question is:

> Can the weight algebra itself be made pluggable or extensible?

## Motivation

For many reasoning workloads, facts are not simply true or false. They may have confidence, probability, cost, trust, provenance, risk, or score.

Examples:

```datalog
edge(a, b, 0.7).
edge(b, c, 0.4).
path(x, y, p) :- edge(x, y, p).
path(x, z, p3) :- path(x, y, p1), edge(y, z, p2), combine(p1, p2, p3).
```

The user can encode probabilities as ordinary columns today, but then the user must manually define and maintain derivation-combination semantics. That becomes error-prone for recursion, multiple proofs, negation, aggregation, and retraction.

A weighted/semiring addon point could let Wirelog keep its core engine small while allowing higher layers to supply alternate weight semantics.

## Key Distinction From Scalar Addons

Scalar addons from #912 are row-local expression functions. They are enough for probability arithmetic utilities such as:

- `prob.mul(p1, p2)`
- `prob.noisy_or(p1, p2)`
- `prob.threshold(p, cutoff)`
- `prob.log_sum_exp(a, b)`

They are not enough to define probabilistic logic semantics because they do not control:

- how duplicate derivations combine
- how join weights are produced
- how recursive fixpoints converge
- how retractions undo or recompute marginal probabilities
- how negation behaves under uncertainty
- how aggregation interacts with weights
- how provenance is retained or discarded

Weighted/probabilistic logic needs an evaluation-level addon point, not just an expression-level addon point.

## Candidate Abstraction: Weight Algebra / Semiring Addon

A possible addon point would describe the algebra used for tuple weights.

Potential descriptor fields:

- ABI version
- algebra name, e.g. `zset.signed_i64`, `prob.independent`, `provenance.boolean_formula`
- weight value type and storage size
- zero value
- one value
- add/combine operation for duplicate derivations or union
- multiply/extend operation for joins
- negate/retract operation, if supported
- equality/zero test
- serialization/deserialization for snapshot and delta callbacks
- deterministic flag
- exact vs approximate flag
- convergence policy for recursion
- optional provenance-retention policy

Conceptual callbacks:

```c
typedef struct wirelog_weight_ctx wirelog_weight_ctx_t;

typedef int (*wirelog_weight_add_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a,
    const void *b,
    void *out);

typedef int (*wirelog_weight_mul_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a,
    const void *b,
    void *out);

typedef int (*wirelog_weight_retract_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a,
    void *out);

typedef bool (*wirelog_weight_is_zero_fn)(
    wirelog_weight_ctx_t *ctx,
    const void *a);
```

This is only a sketch. The design must account for memory ownership, inline vs heap weights, ABI stability, vectorized execution, and backend storage.

## Probabilistic Algebra Examples

### Independent proof probability

For independent alternative derivations of the same tuple:

```text
combine(p1, p2) = 1 - (1 - p1) * (1 - p2)
```

For joins of independent events:

```text
extend(p1, p2) = p1 * p2
```

This works only under independence assumptions. If derivations share facts, naive noisy-or can overcount unless provenance is tracked.

### Log-space probability

Weights can be stored in log-space to reduce underflow:

```text
extend(logp1, logp2) = logp1 + logp2
combine(logp1, logp2) = logaddexp(logp1, logp2) or log noisy-or variant
```

This raises representation questions because Wirelog's current hot paths are `int64_t`-centric.

### Provenance semiring

Instead of immediately computing probabilities, weights can store symbolic provenance:

```text
edge(a,b) weight = e1
edge(b,c) weight = e2
path(a,c) weight = e1 * e2
```

Marginal probabilities can be computed later from the provenance expression. This avoids some overcounting but can explode in size.

## Major Semantic Questions

Before implementation, the design must answer:

- Is the weight attached to EDB facts, IDB facts, derivations, rules, or all of them?
- Are rule weights supported, or only fact weights?
- How are multiple proofs combined?
- Are derivations assumed independent?
- Is provenance tracked to avoid double counting shared evidence?
- How does recursion converge?
- Are probabilities exact, approximate, sampled, bounded, or top-k?
- How is negation interpreted?
  - closed-world absence?
  - probabilistic complement `1-p`?
  - stratified deterministic negation only?
- How do aggregates behave?
- What is the representation of probability?
  - fixed-scale integer
  - rational
  - double
  - log-space fixed-point
  - interval
  - symbolic provenance handle
- How do insertions and retractions update derived probabilities?
- Does delta propagation carry weight deltas or full replacement weights?
- Can the algebra support negative weights, or is retraction a separate operation?

## Interaction With Current Wirelog Architecture

This is a deep architectural change because current execution assumes signed multiplicity in many places.

Likely affected areas:

- relation storage and row representation
- delta tuple representation
- consolidation and duplicate elimination
- join weight multiplication
- recursive fixpoint iteration
- differential arrangements
- retraction propagation
- aggregation semantics
- snapshot and delta callback ABI
- optimizer assumptions around equivalence and rewrites
- tests for semantics, determinism, and incremental correctness

A weighted addon point must not silently reuse signed-multiplicity assumptions where the selected algebra has different laws.

## Architecture Options

### Option 1: Keep weights as ordinary columns

No new evaluation addon point. Users encode probabilities manually and use #912 scalar functions for arithmetic.

Pros:

- No core semantic change.
- Easy to support after #912.
- Works for simple scoring and confidence propagation.

Cons:

- Multiple derivations are not combined automatically.
- Recursive probabilistic logic is easy to get wrong.
- Retraction semantics remain user-managed.
- No optimizer/runtime awareness.

### Option 2: Add a weight algebra addon point

Expose a pluggable algebra for tuple weights while keeping relational structure in core.

Pros:

- Aligns with the idea that current `+1/-1` multiplicity is one weight algebra.
- Could support probabilistic, confidence, cost, trust, and provenance models.
- Avoids hardcoding one probabilistic semantics into core.

Cons:

- Large ABI and implementation surface.
- Hard interaction with differential/retraction semantics.
- Needs strict algebraic law requirements for optimizer correctness.
- May require separate storage/evaluation paths for non-`int64_t` weights.

### Option 3: Add a separate weighted backend/evaluation mode

Keep the existing Z-set engine unchanged. Add a new backend or evaluation mode that explicitly supports weighted/semiring logic.

Pros:

- Avoids destabilizing existing signed-multiplicity behavior.
- Lets weighted semantics have different storage and fixpoint rules.
- Cleaner boundary than trying to make every hot path generic.

Cons:

- More implementation work.
- Backend abstraction may need to grow.
- Feature parity with the columnar backend is not automatic.

### Option 4: Keep probabilistic inference outside Wirelog

Use Wirelog for deterministic candidate/proof generation and run probabilistic inference in a higher-layer engine.

Pros:

- No core risk.
- Allows specialized inference algorithms.
- Good for complex models such as ProbLog/Markov logic.

Cons:

- Less integrated.
- Incremental feedback loops are application-managed.
- Probabilistic recursion is not native.

## Recommended Direction

Do not treat probabilistic logic as a scalar addon problem.

Short term:

- Use #912 only for probability utility functions.
- Document that probability values carried as columns do not change Wirelog's derivation semantics.
- Avoid adding probabilistic fact/rule syntax until the weight algebra design is settled.

Medium term:

- Investigate Option 2 and Option 3 as separate design tracks.
- Model the current signed multiplicity behavior explicitly as the baseline algebra.
- Define the minimal algebraic laws required by joins, consolidation, recursion, retraction, and optimizer rewrites.
- Decide whether a generic addon ABI is feasible or whether a separate weighted backend is safer.

Long term:

- Consider a reference probabilistic/provenance implementation only after the algebra boundary is validated with tests.

## Possible Minimal Prototype

A prototype should not start with full probabilistic Datalog syntax. Instead:

1. Make the current signed multiplicity algebra explicit in internal documentation.
2. Add test-only hooks or an experimental build option for alternate weight combination.
3. Prototype one simple fixed-scale probability algebra outside public ABI.
4. Run small examples:
   - independent path probability
   - multiple proof combination
   - recursive reachability with bounded convergence
   - insertion/retraction updates
5. Compare results against an external reference implementation.

Only after this should a public ABI be considered.

## Test Requirements

Any weighted/probabilistic evaluation design needs tests for:

- duplicate derivation combination
- join weight extension
- recursive fixpoint convergence
- retraction and replacement updates
- multiple proof overcounting
- shared evidence/provenance cases
- negation policy
- aggregate interaction
- deterministic replay
- snapshot and delta callback representation
- optimizer equivalence preservation or optimizer disabling under unsupported algebra

## Acceptance Criteria

- The issue frames probabilistic logic as a weight-algebra/semiring evaluation problem, not merely scalar functions.
- The current `+1/-1` signed multiplicity model is documented as the baseline weight algebra.
- #912 is identified as sufficient only for probability utility functions, not probabilistic logic semantics.
- The design identifies whether the next step is a generic weight algebra addon point, a separate weighted backend, or an external higher-layer inference engine.
- Any proposed public ABI defines weight representation, add/combine, multiply/extend, zero/one, retraction, equality/zero-test, serialization, and lifecycle rules.
- Any probabilistic semantics proposal defines multiple derivation combination, independence/provenance assumptions, recursion, negation, aggregation, and retraction behavior before implementation.
- Core builtin probability semantics are not added until these questions are resolved.

## Follow-Up Issues

Potential follow-ups:

- Document current signed multiplicity/Z-set algebra as Wirelog's baseline weight model.
- Research semiring/provenance requirements for weighted Datalog in Wirelog.
- Prototype an experimental weighted backend or alternate algebra path.
- Add probability utility functions as examples for the #912 scalar addon point.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design weighted/semiring evaluation addon point for probabilistic logic #913

Summary

Current Mental Model

Motivation

Key Distinction From Scalar Addons

Candidate Abstraction: Weight Algebra / Semiring Addon

Probabilistic Algebra Examples

Independent proof probability

Log-space probability

Provenance semiring

Major Semantic Questions

Interaction With Current Wirelog Architecture

Architecture Options

Option 1: Keep weights as ordinary columns

Option 2: Add a weight algebra addon point

Option 3: Add a separate weighted backend/evaluation mode

Option 4: Keep probabilistic inference outside Wirelog

Recommended Direction

Possible Minimal Prototype

Test Requirements

Acceptance Criteria

Follow-Up Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Design weighted/semiring evaluation addon point for probabilistic logic #913

Description

Summary

Current Mental Model

Motivation

Key Distinction From Scalar Addons

Candidate Abstraction: Weight Algebra / Semiring Addon

Probabilistic Algebra Examples

Independent proof probability

Log-space probability

Provenance semiring

Major Semantic Questions

Interaction With Current Wirelog Architecture

Architecture Options

Option 1: Keep weights as ordinary columns

Option 2: Add a weight algebra addon point

Option 3: Add a separate weighted backend/evaluation mode

Option 4: Keep probabilistic inference outside Wirelog

Recommended Direction

Possible Minimal Prototype

Test Requirements

Acceptance Criteria

Follow-Up Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions