Replies: 2 comments 3 replies
-
|
Just wanted to chime in and say this would be super helpful for my personal use case with powersync in 2 ways:
|
Beta Was this translation helpful? Give feedback.
-
|
Real-world case: compaction is insufficient, forced to redeploy sync rules for performance We're running a production app with ~45 synced tables, including a large table with 15,000+ rows that gets frequently updated. Our initial sync time for new clients had grown to 3+ minutes. We've already: Removed unused columns from sync rules The only thing that dramatically improved performance was redeploying sync rules without any changes. This rebuilds all buckets from scratch with exactly 1 PUT per existing row. Initial sync dropped from 3+ minutes to under 30 seconds — for all users, even though they had to re-sync everything. Since compaction has this structural limitation, we're now forced to set up a scheduled cron job to redeploy sync rules daily during low-traffic hours. It works, but it's clearly a workaround with operational overhead (dual replication slots, WAL accumulation, active users getting forced back to sync screen during switchover). Incremental reprocessing as proposed in this discussion would eliminate our need for this workaround entirely. Even a more targeted solution — like a "rebuild buckets from current state" operation that doesn't require full re-replication from the source database — would be a huge improvement over redeploying sync rules. Would love to see this prioritized! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Currently, when changes to Sync Rules or Sync Streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new Sync Rules. Once that is ready, clients are switched over to sync from the new copy.
While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.
Status
2026-06-15: Updated with the latest status. Sync rules / bucket definitions are not part of the plan anymore, only supporting new sync streams.
2026-03-18: We had an experimental/proof-of-concept build demonstrating incremental reprocessing. We are now working on getting functionality ready for production, and merging pieces of functionality at a time.
2026-01-09: Update implementation status
2025-12-09: Updated plan with more specifics, implementation tasks and links to relevant PRs.
2025-09-01: Original version of proposal outlined two implementation options.
Proposal
The base idea is to only reprocess data in storage that we need to.
With sync streams now being stable, only sync streams with
config.edition: 3(or future versions) will be supported. Using sync rules or alpha sync streams will require full reprocessing as before.In general:
Modifying an existing stream preserves existing data without reprocessing on a "best-effort" basis. The understand this, note that a sync stream query is split into 3 parts:
Reprocessing of streams happen on the level of bucket data sources and parameter indexes. If one is added or modified, all data for that one is reprocessed. Note that adding a query to a sync stream can trigger reprocessing of other queries/tables in the same sync stream, if they share the same output buckets.
Changing the querier part of queries does not require any reprocessing. For example, changing a query from
SELECT * FROM projects WHERE user_id = auth.user_id()toSELECT * FROM projects WHERE user_id = auth.jwt() ->> 'owner'requires no reprocessing, since it does not require any changes to storage - only the sync-time evaluation is changed. But changing toSELECT * FROM projects WHERE owner_id = auth.user_id()requires a change to storage, so that triggers reprocessing of the related data.The specifics here depend a lot on how queries are parsed - more examples will follow in the future.
Implementation
Previously, a "sync rules version" was created on each deploy of sync rules / sync streams. Each version was processed separately from all others, with isolated storage. Each had its own "replication stream": The process that replicates from the source database, owning a replication slot, change stream, or replication progress tracking for the source database. The concept of replication streams were never explicit, since it overlapped with sync rules versions.
This now splits it into two separate entities:
There can now be multiple sync configs per replication stream. In that case, they can share storage, avoiding the need to re-replicate all data from scratch. Separate replication streams are still used if the sync configs cannot share data, for example due to incompatible replication config.
Within a replication stream, storage is further split by bucket source definition and parameter index. This makes it easy to drop all storage for specific definitions if they are removed.
When a new sync config is added to an existing replication stream, we compute all definitions that are not covered by existing sync configs in that replication stream. For any new definitions, we re-snapshot all the relevant source tables. These create new
SourceTableentities, allowing any existing definitions from other sync configs to be unaffected.The snapshots above, as well as the initial replication snapshots, now happen concurrently with streaming replication. This means data will continue being streamed for the existing sync config while we snapshot. It also means that the streaming replication position will not get behind while we do an initial snapshot, avoiding issues with for example Postgres WAL slots exceeding
max_slot_wal_keep_size.All the above storage changes are implemented in a new storage version, since it includes many breaking changes. During development, storage version 3 is used for this. Once the structure is fixed, it will become the stable storage version 4.
Implementation progress
Other considerations
Defragmenting
The "Defragment" (reprocess) API creates a new replication stream, re-replicating all data. In the future we can investigate more granular approaches to defragmentation.
Beta Was this translation helpful? Give feedback.
All reactions