Summary
Wirelog should not add geometry/coordinate operations such as within, intersects, or distance as core builtins. Instead, the core should expose a small, stable addon point that lets upper layers or separate packages register deterministic scalar functions and boolean predicates that can be called from Datalog expressions.
The immediate motivation is geometry/coordinate workloads, but the addon point should be domain-neutral. Geometry should be one consumer of the mechanism, not the mechanism itself.
This follows the decision from #910: keep geometry libraries and geometry semantics outside the core, and open a minimal extension surface in the core.
Goals
- Add a core-supported addon point for scalar expression functions.
- Allow extension functions to be used in
FILTER and MAP-style expression evaluation.
- Keep individual domain functions out of public core enums and internal builtin opcode lists.
- Keep GEOS/PROJ/GDAL and similar domain libraries out of the core dependency graph.
- Preserve Wirelog's C11, embedded-friendly, ABI-conscious architecture.
- Provide enough metadata for determinism, error handling, arity/type checking, and lifecycle safety.
Non-Goals
- Implement geometry operations in core.
- Implement SPARQL, a SPARQL query planner, or an RDF store.
- Add a native geometry column type in the first iteration.
- Add spatial indexes, spatial join operators, or optimizer pushdown in the first iteration.
- Allow arbitrary stateful operators to mutate sessions, relations, plans, or backend internals.
- Treat native dynamic loading as a sandbox. Loaded addons are trusted process-local code.
Current State
Today, builtin expression functions are hardwired through several layers:
- lexer tokens and keyword mapping
- parser branches and AST node fields
- IR expression kinds and enum values
- serialized plan expression opcodes
- columnar evaluator switch dispatch
- parser/evaluator/unit tests
- public header and ABI checks when public enums are touched
This works for a small fixed builtin set, but it does not scale to domain-specific function families. Adding each geometry function this way would expand the core grammar, ABI pressure, evaluator code, dependency matrix, and tests.
Wirelog already has an I/O adapter plugin concept, but that surface is for data sources/sinks. It is not an expression function registry and should not be overloaded for expression evaluation.
Proposed Direction
Add a dedicated scalar function addon API.
The first version should support pure row-local functions that operate on existing Wirelog value representations:
int64
- boolean
- interned string handles
Extension callbacks should be callable from serialized expression evaluation, but they must not modify session state, relation storage, execution plans, or backend internals.
A geometry extension can then register functions such as:
geo.within(wkt_a, wkt_b) -> bool
geo.intersects(wkt_a, wkt_b) -> bool
geo.contains(wkt_a, wkt_b) -> bool
geo.distance(wkt_a, wkt_b) -> int64 using a documented fixed scale
geo.envelope_minx(wkt) -> int64
geo.envelope_maxx(wkt) -> int64
Those functions should live in a separate wirelog-extension-* package or upper-layer integration and may depend on GEOS/PROJ/GDAL there.
Public API Shape
A new installed header is likely needed, for example:
wirelog/wirelog-extension.h
- or
wirelog/wirelog-function.h
All public names must follow the existing rules:
- public functions/types:
wirelog_ prefix
- public macros/enums:
WIRELOG_ prefix
- exported prototypes:
WIRELOG_API
- installed header list updated through
wirelog_public_headers in meson.build
A minimal API should include:
#define WIRELOG_EXTENSION_ABI_VERSION 1u
typedef enum {
WIRELOG_EXTENSION_VALUE_INT64 = 1,
WIRELOG_EXTENSION_VALUE_BOOL = 2,
WIRELOG_EXTENSION_VALUE_STRING = 3
} wirelog_extension_value_type_t;
typedef struct wirelog_extension_ctx wirelog_extension_ctx_t;
typedef struct {
wirelog_extension_value_type_t type;
int64_t value;
} wirelog_extension_value_t;
typedef int (*wirelog_extension_scalar_fn)(
wirelog_extension_ctx_t *ctx,
const wirelog_extension_value_t *args,
uint32_t argc,
wirelog_extension_value_t *out,
void *user_data);
typedef struct wirelog_extension_function {
uint32_t abi_version;
const char *name;
uint32_t min_arity;
uint32_t max_arity;
wirelog_extension_value_type_t return_type;
uint32_t flags;
wirelog_extension_scalar_fn eval;
void *user_data;
} wirelog_extension_function_t;
WIRELOG_API wirelog_error_t
wirelog_extension_register_function(const wirelog_extension_function_t *fn);
WIRELOG_API wirelog_error_t
wirelog_extension_unregister_function(const char *name);
WIRELOG_API const char *
wirelog_extension_last_error(void);
WIRELOG_API const char *
wirelog_extension_ctx_string(wirelog_extension_ctx_t *ctx, int64_t intern_id);
WIRELOG_API wirelog_error_t
wirelog_extension_ctx_intern_string(wirelog_extension_ctx_t *ctx,
const char *utf8, int64_t *out_intern_id);
The exact shape can change during design, but the first version should stay narrow.
Function Metadata
The descriptor should carry enough information for validation and future optimizer decisions without exposing backend internals.
Required or strongly recommended metadata:
- function name, preferably namespace-qualified such as
geo.intersects
- ABI version
- minimum and maximum arity
- argument type constraints
- return type
- flags for
pure, deterministic, may_allocate, thread_safe, and depends_on_external_data
- callback pointer
- user data pointer
- optional version string for the provider
Non-deterministic functions should be rejected by default for recursive or materialized evaluation paths unless an explicit future opt-in policy is designed.
Parser and Compile Integration
Do not add every extension function as a lexer keyword.
Instead, add one generic call form. Possible syntax options:
ext("geo.intersects", geom_a, geom_b)
or:
call("geo.intersects", geom_a, geom_b)
Design questions:
- Should the generic call name be
ext, call, or something else?
- Should function names be string literals only, to avoid colliding with variable identifiers?
- Should missing functions fail at parse time, compile/session creation time, or first evaluation?
- Should parse/compile accept a registry snapshot to avoid global mutable registry behavior?
Recommended direction:
- Parse the syntax without requiring global registry state.
- Resolve function names during compile/session creation using an explicit registry snapshot or compile context.
- Store resolved function ids or pinned descriptors in the execution plan/session so evaluation does not repeatedly perform name lookup.
AST, IR, and Plan Changes
Add a generic external function representation instead of adding domain-specific builtin enum values.
Possible internal additions:
- AST node:
WL_PARSER_AST_NODE_EXT_FUNCTION
- IR expression:
WL_IR_EXPR_EXT_FN
- plan expression opcode:
WL_PLAN_EXPR_EXT_CALL
Plan payload options:
- name-based payload: function name + arity
- id-based payload: resolved registry id + arity
- descriptor-snapshot payload: session-owned function reference table index + arity
The id or descriptor-snapshot approach is preferable for evaluation speed and lifetime safety, but it requires clear pin/unregister semantics.
Evaluator Semantics
The columnar evaluator should call extension functions only through the approved descriptor/callback ABI.
Rules:
- Extension callbacks are row-local scalar calls.
- Callbacks must not mutate Wirelog session state.
- Callbacks must not call back into evaluation recursively unless explicitly supported later.
- Callback arguments are immutable.
- Callback return values must match the descriptor return type.
- Callback failure behavior must be deterministic and documented.
Failure policy should distinguish contexts:
- Boolean predicate in
FILTER: callback failure may reject the row if configured as fail-closed.
MAP or head expression: callback failure should likely produce an execution error rather than silently inventing a value.
- Type/arity mismatch: should fail before evaluation when possible.
Lifetime and Unregister Policy
This is a critical ABI point.
The implementation must define what happens when a registered function is referenced by a parsed program, compiled plan, or active session.
Recommended policy:
- Register descriptors must remain valid until unregistered.
- Session creation pins or snapshots every referenced descriptor.
wirelog_extension_unregister_function() fails while any active session pins the function, or it marks the descriptor for deferred removal.
- Dynamic plugin unload is not allowed while any descriptor from that plugin is pinned.
- Compiled plans should not contain dangling raw function pointers after unregister.
Thread Safety
Wirelog sessions are not generally thread-safe at the public API level, but backend evaluation may use worker threads internally. The extension ABI must therefore specify whether callbacks can be invoked concurrently.
Recommended first version:
- Default: callbacks are assumed not thread-safe unless flagged otherwise.
- If evaluator worker paths may call the same descriptor concurrently, only functions marked
thread_safe may be used there.
- Alternatively, v1 can serialize extension callback invocation and document the performance cost.
- Error reporting should use thread-local storage.
Error Model
The API needs a small, explicit error contract.
Design points:
- callback returns
wirelog_error_t or an extension-specific status code
- callback can set a human-readable error on
wirelog_extension_ctx_t
- global
wirelog_extension_last_error() should be thread-local
- parse/compile errors should report missing function, invalid arity, invalid type signature, and non-deterministic function rejection clearly
- evaluator errors should identify the function name and context without exposing internal row buffers
Security Model
Native addons run in the Wirelog process. They are not sandboxed.
Documentation should state:
- only trusted addons should be loaded
- addon callbacks can crash or corrupt the process if they violate the ABI
- Wirelog does not isolate GEOS/PROJ/GDAL or other third-party library vulnerabilities
- malformed WKT/WKB and expensive topology operations are the addon provider's responsibility to bound and test
- dynamic loading should remain explicit opt-in for the CLI
Dependency Policy
Core must not link to GEOS/PROJ/GDAL for this feature.
Recommended dependency split:
- core: extension registry, parser/IR/plan/evaluator dispatch, tests with mock functions
- geometry extension: GEOS/PROJ/GDAL and geometry-specific conformance tests
- CLI: optional addon loader behind an explicit Meson option, separate from the I/O adapter loader
The extension ABI version must be separate from WIRELOG_IO_ABI_VERSION.
Test Plan
Core tests:
- register/find/unregister success path
- duplicate function registration
- invalid ABI version
- invalid descriptor fields
- missing function resolution
- arity mismatch
- type mismatch
- callback success for unary/binary/ternary functions
- callback failure in
FILTER
- callback failure in
MAP/head expression
- string intern lookup from callback context
- string interning from callback result
- unregister while an active session references a function
- deterministic/pure flag enforcement
- thread-local error reporting
- expression serialization round trip for
WL_PLAN_EXPR_EXT_CALL
- mixed builtin and extension expression evaluation
ABI and public surface tests:
- public header surface
- public prefix checks
- public API macro checks
- public API export checks
- ABI symbol manifest updates
- standalone installed-header compile tests
Geometry extension tests, outside the core mandatory path:
- WKT point/polygon truth table
- invalid WKT
- empty geometry
- SRID missing/mismatch policy
- fixed-scale distance behavior
- GEOS/PROJ unavailable: core build still succeeds
- extension build requires geometry dependencies explicitly
Phased Implementation
M0: Design Freeze
- Decide syntax:
ext(...), call(...), or another spelling.
- Decide value types for v1.
- Decide missing-function resolution phase.
- Decide callback failure behavior by context.
- Decide pin/unregister policy.
- Decide thread-safety policy.
M1: Public Registry ABI
- Add installed public header.
- Add internal registry implementation.
- Add descriptor validation.
- Add thread-local last-error support.
- Add ABI/public-surface tests.
M2: Parser, AST, IR, Plan
- Add generic extension call syntax.
- Add AST and IR nodes.
- Add
WL_PLAN_EXPR_EXT_CALL serialization.
- Resolve function references during compile/session creation.
M3: Columnar Evaluator Dispatch
- Invoke pinned extension descriptors from expression evaluation.
- Add callback context for interned strings.
- Add failure handling.
- Ensure fast-path expression compilation safely falls back when extension calls are present.
M4: CLI Addon Loading
- Add optional dynamic loading for function addons.
- Keep it separate from I/O adapter loading.
- Gate it behind an explicit Meson option.
- Document that native loading is trusted-code only.
M5: Reference Geometry Extension
- Build a separate WKT-based geometry extension on top of the addon API.
- Keep GEOS/PROJ out of the core build.
- Use the extension to validate that the API is sufficient for geometry workloads.
Acceptance Criteria
- Core does not add geometry functions as public builtin enums or mandatory internal opcodes.
- Core does not gain GEOS/PROJ/GDAL as mandatory dependencies.
- A stable scalar function addon API exists with a distinct ABI version.
- Public prototypes use
WIRELOG_API and naming follows wirelog_ / WIRELOG_ rules.
- Installed header changes are reflected through
wirelog_public_headers in meson.build.
- Extension functions can be called from Datalog expressions through one generic syntax.
- Missing functions, arity mismatches, type mismatches, and callback failures are tested.
- Active session lifetime prevents dangling callback pointers or unsafe unload.
- Determinism/purity metadata is enforced or explicitly documented.
- Native addon loading is documented as trusted-code execution, not sandboxing.
- A future geometry extension can be implemented outside core using the addon point.
Summary
Wirelog should not add geometry/coordinate operations such as
within,intersects, ordistanceas core builtins. Instead, the core should expose a small, stable addon point that lets upper layers or separate packages register deterministic scalar functions and boolean predicates that can be called from Datalog expressions.The immediate motivation is geometry/coordinate workloads, but the addon point should be domain-neutral. Geometry should be one consumer of the mechanism, not the mechanism itself.
This follows the decision from #910: keep geometry libraries and geometry semantics outside the core, and open a minimal extension surface in the core.
Goals
FILTERandMAP-style expression evaluation.Non-Goals
Current State
Today, builtin expression functions are hardwired through several layers:
This works for a small fixed builtin set, but it does not scale to domain-specific function families. Adding each geometry function this way would expand the core grammar, ABI pressure, evaluator code, dependency matrix, and tests.
Wirelog already has an I/O adapter plugin concept, but that surface is for data sources/sinks. It is not an expression function registry and should not be overloaded for expression evaluation.
Proposed Direction
Add a dedicated scalar function addon API.
The first version should support pure row-local functions that operate on existing Wirelog value representations:
int64Extension callbacks should be callable from serialized expression evaluation, but they must not modify session state, relation storage, execution plans, or backend internals.
A geometry extension can then register functions such as:
geo.within(wkt_a, wkt_b) -> boolgeo.intersects(wkt_a, wkt_b) -> boolgeo.contains(wkt_a, wkt_b) -> boolgeo.distance(wkt_a, wkt_b) -> int64using a documented fixed scalegeo.envelope_minx(wkt) -> int64geo.envelope_maxx(wkt) -> int64Those functions should live in a separate
wirelog-extension-*package or upper-layer integration and may depend on GEOS/PROJ/GDAL there.Public API Shape
A new installed header is likely needed, for example:
wirelog/wirelog-extension.hwirelog/wirelog-function.hAll public names must follow the existing rules:
wirelog_prefixWIRELOG_prefixWIRELOG_APIwirelog_public_headersinmeson.buildA minimal API should include:
The exact shape can change during design, but the first version should stay narrow.
Function Metadata
The descriptor should carry enough information for validation and future optimizer decisions without exposing backend internals.
Required or strongly recommended metadata:
geo.intersectspure,deterministic,may_allocate,thread_safe, anddepends_on_external_dataNon-deterministic functions should be rejected by default for recursive or materialized evaluation paths unless an explicit future opt-in policy is designed.
Parser and Compile Integration
Do not add every extension function as a lexer keyword.
Instead, add one generic call form. Possible syntax options:
or:
Design questions:
ext,call, or something else?Recommended direction:
AST, IR, and Plan Changes
Add a generic external function representation instead of adding domain-specific builtin enum values.
Possible internal additions:
WL_PARSER_AST_NODE_EXT_FUNCTIONWL_IR_EXPR_EXT_FNWL_PLAN_EXPR_EXT_CALLPlan payload options:
The id or descriptor-snapshot approach is preferable for evaluation speed and lifetime safety, but it requires clear pin/unregister semantics.
Evaluator Semantics
The columnar evaluator should call extension functions only through the approved descriptor/callback ABI.
Rules:
Failure policy should distinguish contexts:
FILTER: callback failure may reject the row if configured as fail-closed.MAPor head expression: callback failure should likely produce an execution error rather than silently inventing a value.Lifetime and Unregister Policy
This is a critical ABI point.
The implementation must define what happens when a registered function is referenced by a parsed program, compiled plan, or active session.
Recommended policy:
wirelog_extension_unregister_function()fails while any active session pins the function, or it marks the descriptor for deferred removal.Thread Safety
Wirelog sessions are not generally thread-safe at the public API level, but backend evaluation may use worker threads internally. The extension ABI must therefore specify whether callbacks can be invoked concurrently.
Recommended first version:
thread_safemay be used there.Error Model
The API needs a small, explicit error contract.
Design points:
wirelog_error_tor an extension-specific status codewirelog_extension_ctx_twirelog_extension_last_error()should be thread-localSecurity Model
Native addons run in the Wirelog process. They are not sandboxed.
Documentation should state:
Dependency Policy
Core must not link to GEOS/PROJ/GDAL for this feature.
Recommended dependency split:
The extension ABI version must be separate from
WIRELOG_IO_ABI_VERSION.Test Plan
Core tests:
FILTERMAP/head expressionWL_PLAN_EXPR_EXT_CALLABI and public surface tests:
Geometry extension tests, outside the core mandatory path:
Phased Implementation
M0: Design Freeze
ext(...),call(...), or another spelling.M1: Public Registry ABI
M2: Parser, AST, IR, Plan
WL_PLAN_EXPR_EXT_CALLserialization.M3: Columnar Evaluator Dispatch
M4: CLI Addon Loading
M5: Reference Geometry Extension
Acceptance Criteria
WIRELOG_APIand naming followswirelog_/WIRELOG_rules.wirelog_public_headersinmeson.build.