Skip to content

feat: continuous rerand#2007

Open
philsippl wants to merge 78 commits intomainfrom
ps/cont-rerand
Open

feat: continuous rerand#2007
philsippl wants to merge 78 commits intomainfrom
ps/cont-rerand

Conversation

@philsippl
Copy link
Copy Markdown
Contributor

@philsippl philsippl commented Feb 27, 2026

Spec and implementation of continuous rerandomization of the iris share databases.

Spec is provided in docs/specs/rerandomization.md.

Goals:

  • Continuous rerandomization of shares in persistent storage (in-memory representation stays untouched)
  • Rerand performed by separate server than matching server (for network performance reasons)
  • Minimal impact on normal server (e.g. matching, startup, ...)
  • Support for GPU and HNSW versions

High level design:

  • Rerand server operating in chunks over DB

  • The rerand servers can be at most 1 chunk (or 1 epoch at the boundary) apart from each other

  • Synchronization with matching server required:

    • at startup: while the matching server is loading the db, we freeze the rerand
    • when applying modifications: iris share table is always locked, when there is a writer
  • Rerand servers first write to "staging" schema and then apply in chunks to reduce lock times

  • Relies on the current system property that modifications are guaranteed to eventually arrive on all parties

    • This PR also makes modification handling a bit more robust by deleting only after storing a modification and recovering potentially lost modifications.

    related: support additional route ampc-common#71

@github-actions github-actions bot added the chore label Feb 27, 2026
Comment thread iris-mpc-store/src/rerand.rs
Comment thread iris-mpc-store/src/rerand.rs Outdated
@philsippl philsippl changed the title initial implementation feat: continuous rerand Feb 27, 2026
@philsippl philsippl requested a review from dkales February 27, 2026 12:48
@philsippl philsippl marked this pull request as ready for review March 1, 2026 22:05
@philsippl philsippl requested a review from a team as a code owner March 1, 2026 22:05
@philsippl philsippl requested a review from eaypek-tfh March 2, 2026 13:23
Copy link
Copy Markdown
Collaborator

@dkales dkales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, overall strategy seems good. Have not looked at the e2e tests yet.

Comment thread iris-mpc-common/src/helpers/sync.rs Outdated
Comment thread iris-mpc-store/src/rerand.rs Outdated
@@ -138,33 +172,75 @@ pub async fn server_main(config: Config) -> Result<()> {

sync_sqs_queues(&config, &sync_result, &aws_clients).await?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the sync_sqs_queues logic not violate the assumptions on the sqs queues?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be fair IDK if this ever happens

Copy link
Copy Markdown
Contributor Author

@philsippl philsippl Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it actually is fine even in theory because of the modification sync happening before, so all nodes sync on them before. So after that, we would only delete modifications that no node has received. But this all relies on the order of the code and hopefully we anyways don't get a queue desync in prod.

Comment thread iris-mpc/src/server/mod.rs
Comment thread iris-mpc-upgrade/src/s3_coordination.rs
Comment thread iris-mpc-upgrade/src/epoch.rs
Comment thread iris-mpc-upgrade/src/epoch.rs Outdated
Comment thread iris-mpc-store/src/rerand.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants