Skip to content

Rust rule benchmarking tableaction cache#830

Draft
MilkBlock wants to merge 7 commits intoegraphs-good:mainfrom
MilkBlock:rust-rule-benchmarking-tableaction-cache
Draft

Rust rule benchmarking tableaction cache#830
MilkBlock wants to merge 7 commits intoegraphs-good:mainfrom
MilkBlock:rust-rule-benchmarking-tableaction-cache

Conversation

@MilkBlock
Copy link
Copy Markdown
Contributor

cache TableAction for every RustRuleContext so that we could avoid copy every rule run.

@MilkBlock MilkBlock requested a review from a team as a code owner March 17, 2026 09:55
@MilkBlock MilkBlock requested review from saulshanabrook and removed request for a team March 17, 2026 09:55
@MilkBlock
Copy link
Copy Markdown
Contributor Author

Please review it after rust_rule_benchmarking merge.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 68.53933% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.99%. Comparing base (f6cff9b) to head (7de2394).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/prelude.rs 68.53% 28 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #830      +/-   ##
==========================================
- Coverage   82.07%   81.99%   -0.08%     
==========================================
  Files          88       88              
  Lines       24709    24780      +71     
==========================================
+ Hits        20279    20319      +40     
- Misses       4430     4461      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 17, 2026

Merging this PR will improve performance by ×21

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 7 improved benchmarks
✅ 31 untouched benchmarks
⏩ 190 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime rust_rule_match_overhead[rule_run_1] 21 ms 14.6 ms +43.11%
WallTime rust_rule_tableaction_hot_path[facts50000_funcs200] 627.9 ms 34.2 ms ×18
WallTime rust_rule_match_with_serialize[rule_run_1] 303.7 ms 286.4 ms +6.03%
WallTime rust_rule_insert_loop[ops1000_funcs200] 474.1 µs 411.3 µs +15.28%
Simulation rust_rule_tableaction_hot_path[facts50000_funcs200] 872.8 ms 41.6 ms ×21
Simulation rust_rule_match_overhead[rule_run_1] 32.4 ms 23.2 ms +39.23%
Simulation rust_rule_insert_loop[ops1000_funcs200] 713 µs 623.7 µs +14.32%

Comparing MilkBlock:rust-rule-benchmarking-tableaction-cache (7de2394) with main (f6cff9b)

Open in CodSpeed

Footnotes

  1. 190 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@MilkBlock
Copy link
Copy Markdown
Contributor Author

@oflatt

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Rust-backed rule execution by avoiding per-match cloning of all TableActions, instead caching shared TableActions by backend FunctionId and cloning only when mutation-scratch is required.

Changes:

  • Replace HashMap<String, TableAction> cloning per apply with an Arc<Vec<Option<TableAction>>> indexed by FunctionId.
  • Add internal name→FunctionId mapping and expose new lookup_id/insert_id/remove_id/subsume_id APIs.
  • Introduce a per-callback (per-match) lazy cache for mutable TableAction clones used by insert/subsume.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/prelude.rs
Comment on lines +323 to +331
pub struct RustRuleContext<'a, 'b, 'c> {
exec_state: &'a mut ExecutionState<'b>,
union_action: egglog_bridge::UnionAction,
table_actions: HashMap<String, egglog_bridge::TableAction>,
/// Name -> backend FunctionId (captured at rust-rule registration time).
func_ids: &'c HashMap<String, egglog_bridge::FunctionId>,
/// Table actions indexed by backend FunctionId (captured at rust-rule registration time).
table_actions_by_id: &'c [Option<egglog_bridge::TableAction>],
/// Per-match table actions, lazily populated from `table_actions_by_id` (for scratch reuse).
table_actions_cache_by_id: Vec<(egglog_bridge::FunctionId, egglog_bridge::TableAction)>,
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RustRuleContext is a public type and this change adds a new lifetime parameter ('c). That’s a breaking API change for downstream crates that spell the type as RustRuleContext<'a, 'b>. Consider avoiding the extra lifetime by tying func_ids/table_actions_by_id to the existing borrow lifetime (e.g. use the same lifetime as exec_state) or by storing Arc<_> in the context instead of references.

Copilot uses AI. Check for mistakes.
Comment thread src/prelude.rs
Comment on lines 400 to 416
/// Do a table lookup. This is potentially a mutable operation!
/// For more information, see `egglog_bridge::TableAction::lookup`.
pub fn lookup(&mut self, table: &str, key: &[Value]) -> Option<Value> {
self.get_table_action(table).lookup(self.exec_state, key)
let table_id = self.func_id(table);
self.lookup_id(table_id, key)
}

/// Do a table lookup by cached backend FunctionId.
/// For more information, see `egglog_bridge::TableAction::lookup`.
pub fn lookup_id(&mut self, table: egglog_bridge::FunctionId, key: &[Value]) -> Option<Value> {
let RustRuleContext {
exec_state,
table_actions_by_id,
..
} = self;
Self::get_table_action_by_id(table_actions_by_id, table).lookup(exec_state, key)
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new public *_id methods take egglog_bridge::FunctionId, but the Rust API doesn’t appear to expose any way for users to obtain/cache a FunctionId (e.g., Function.backend_id is private and func_id here is private). Either expose a stable way to get the FunctionId (from RustRuleContext and/or EGraph) or keep these *_id methods non-public to avoid adding an effectively unusable public API surface.

Copilot uses AI. Check for mistakes.
Comment thread src/prelude.rs
table: egglog_bridge::FunctionId,
) -> &egglog_bridge::TableAction {
table_actions_by_id
.get(table.rep() as usize)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing the per-function vector via table.rep() as usize works for the current define_id! implementation, but NumericId already provides index() specifically for dense indexing. Using table.index() here (and in other indexing sites) makes the intent clearer and avoids coupling this code to Rep being a small integer type.

Suggested change
.get(table.rep() as usize)
.get(table.index())

Copilot uses AI. Check for mistakes.
Comment thread src/prelude.rs
let mut table_actions_by_id: Vec<Option<egglog_bridge::TableAction>> = Vec::new();
for (k, v) in egraph.functions.iter() {
func_ids.insert(k.clone(), v.backend_id);
let idx = v.backend_id.rep() as usize;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: v.backend_id.rep() as usize is being used as a dense index. Prefer v.backend_id.index() (from NumericId) for clarity and to avoid relying on Rep layout.

Suggested change
let idx = v.backend_id.rep() as usize;
let idx = v.backend_id.index();

Copilot uses AI. Check for mistakes.
Comment thread src/prelude.rs
let prim_name = egraph.parser.symbol_gen.fresh("rust_rule_prim");
let panic_id = egraph.backend.new_panic(format!("{prim_name}_panic"));

let mut func_ids: HashMap<String, egglog_bridge::FunctionId> = Default::default();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap::with_capacity?

Comment thread src/prelude.rs
exec_state,
union_action: self.union_action,
table_actions: self.table_actions.clone(),
func_ids: self.func_ids.as_ref(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do func_ids and table_actions_by_id need to be Arc-wrapped, if they are passed by reference?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just keep the original table_actions but pass by reference here (instead of self.table_actions.clone()), and not have the indirections with func_ids, table_actions_by_id, and table_actions_cache_by_id? I'm not quite convinced that these new structs provide speedup

Comment thread src/prelude.rs

let mut func_ids: HashMap<String, egglog_bridge::FunctionId> = Default::default();
func_ids.reserve(egraph.functions.len());
let mut table_actions_by_id: Vec<Option<egglog_bridge::TableAction>> = Vec::new();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reserve the vec once (by scanning egraph.functions) instead of repeatedly resizing

Comment thread src/prelude.rs
.unwrap_or_else(|| panic!("missing table action for FunctionId({})", table.rep()))
}

fn get_table_action_cache_by_id<'a>(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this function? Is this caching the lookups to a hashmap with a (Id, TableAction) vector, because supposedly looking up the vector is faster than a hashmap?

Comment thread src/prelude.rs
/// Insert a row into a table.
/// For more information, see `egglog_bridge::TableAction::insert`.
pub fn insert(&mut self, table: &str, row: impl Iterator<Item = Value>) {
self.get_table_action(table).insert(self.exec_state, row)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This get_table_action call is expensive. Because it clones the TableAction every call. But this is redundant because though insert requires mutable borrow of TableAction but its so-called mutable borrow only change the capacity of scratch (because scratch will be cleared every call and other part will not change after first set).

So it should be cached to improve the performance.

pub struct TableAction {
    table: TableId,  // not change after first set
    table_math: SchemaMath,   // not change after first set
    default: Option<MergeVal>, // not change after first set
    timestamp: CounterId,  // not change after first set
    scratch: Vec<Value>,    // in fact scratch is a tmp variable will be `clear` in every TableOperation
}

There are only 4 kinds of TableOperations need TableAction : look_up, insert, delete, subsume.

look_up & remove require & borrow
insert & delete require &mut borrow but it's fake(only touch capacity of scratch).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihozhang Basing on this observation, we can safely degrade the signature of insert and subsume from &mut TableAction to & TableAction. So I can make this PR more readable.

@MilkBlock MilkBlock marked this pull request as draft April 4, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants