Psycopack

Customisable table reorganising using shadow tables and minimal locks.

Psycopack can be used to perform the following operations without downtime:

Schema changes that would be dangerous (locking the table for too long) using regular SQL DDL statements (e.g. adding exclusion constraints).
Converting a primary key from int to bigint.
Clustering (or repacking) the table to address table and index bloat (similar to pg_repack).
Partition an existing table (only by date range at the moment).

Basics

Psycopack uses a shadow-table approach.

This means creating a copy of the table (shadow table), synchronising its data with the original table data, and swapping it with the original. It is also possible to reverse the swap, so that you can verify the results before deciding to commit to them.

The Psycopack process achieves the above in 6 major steps:

Pre-validation: Checks if the table can be processed by Psycopack at all. See Known Limitations below.
Setup: Creates the copy table, a trigger to keep the shadow table in sync with the original, and a backfill log table to keep track of rows from the original table that need to be copied over.
Backfill: Copies data from the original table to the copy one.
Schema synchronisation: Recreates all indexes and constraints in the copy table. Also recreates foreign keys on referring tables. This step is performed after the backfill is complete to speed up writes on the copy table.
Swap: Effectively swaps the original table by the copy table. The original (now old) table will be kept in sync via triggers, in case the swap process needs to be reverted. The swapping is done in a way that allows to revert the process if something goes wrong, and without locking the table for more than a few milliseconds. This is achieved via an operation called reverse swap.
Clean up: Drops the old table. After this point, the process is irreversible, so it's recommended to verify the results before running this step.

There are also two strategies for performing the process:

Change log (Recommended): The backfill is performed in batches, and changes after the process kicks off are tracked via a change log table. This allows the copy table to synchronise its schema with the original table without using concurrent index creation, which can take a long time and can be blocked by long-running transactions.
Triggers: The backfill is performed in batches, and changes after the process kicks off are immediately pushed to the copy table via triggers. This allows the copy table to be up to date after the backfill process is complete, the schema can be synchronised afterwards using operations that only acquire weak locks.

Python Interface

Psycopack runs on a Python class that can be used to configure the outcome of the process. Below are some examples of how Psycopack can be used.

Example: Performing schema changes

from psycopack import psycopack, get_db_connection

with get_db_connection("postgresql://user:password@host:port/db") as conn:
    with conn.cursor() as cur:
        psycopack = Psycopack(
            batch_size=50_000,
            conn=conn,
            cur=cur,
            schema="public",
            table="to_psycopack",
        )
        psycopack.pre_validate()
        psycopack.setup_repacking()

        # Now that the shadow table is set up, change the schema accordingly.
        cur.execute(f"ALTER TABLE {psycopack.copy_table} ... ")

        # Finish the process.
        psycopack.sync_schemas()
        repack.post_sync_update()
        psycopack.swap()
        psycopack.clean_up()

Example: Converting the primary key to bigint

from psycopack import psycopack, get_db_connection

with get_db_connection("postgresql://user:password@host:port/db") as conn:
    with conn.cursor() as cur:
        psycopack = Psycopack(
            batch_size=50_000,
            conn=conn,
            cur=cur,
            schema="public",
            table="to_psycopack",
            # Set this argument and Psycopack will do the conversion
            # automatically.
            convert_pk_to_bigint=True,
        )
        psycopack.full()

Example: Repacking (or clustering)

from psycopack import psycopack, get_db_connection

with get_db_connection("postgresql://user:password@host:port/db") as conn:
    with conn.cursor() as cur:
        psycopack = Psycopack(
            batch_size=50_000,
            conn=conn,
            cur=cur,
            schema="public",
            table="to_psycopack",
        )
        # Simply run `.full()` to repack the whole table with further
        # customisation.
        psycopack.full()

Example: Partitioning by date range

from psycopack import psycopack, get_db_connection

with get_db_connection("postgresql://user:password@host:port/db") as conn:
    with conn.cursor() as cur:
        psycopack = Psycopack(
            table="to_repack",
            batch_size=1,
            conn=connection,
            cur=cur,
            change_log_batch_size=10,
            sync_strategy=SyncStrategy.CHANGE_LOG,
            partition_config=_partition.PartitionConfig(
                column="datetime_field",
                num_of_extra_partitions_ahead=10,
                strategy=_partition.DateRangeStrategy(
                    partition_by=_partition.PartitionInterval.DAY
                ),
            ),
        )

Known Limitations

The following types of tables aren't currently supported:

Inherited tables.
Tables without primary keys.
Tables with composite primary keys.
Primary key field type not one of:
- bigint
- bigserial
- integer
- serial
- smallint
- smallserial
Tables with triggers.
Tables with invalid indexes (the user should drop or re-index them first).
Tables with deferrable unique constraints.
Referring foreign keys on a different schema than the original table.

Limitations when partitioning:

The CHANGE_LOG strategy must be used when doing partitioning.
Unique constraints are skipped, as they are not supported on partitioned tables. The user should add them manually after the process is complete or after the setup step.
No support for referring fks yet.

Required user permissions (or privileges)

Unless the user is a superuser, they may lack certain privileges to run Psycopack.

If such privileges are not assigned, Psycopack will raise an error informing the user which tables are lacking a particular privilege or ownership and which statements are required to fix it.

In summary, the user must have at least:

CREATE and USAGE privilege on the schema:

GRANT CREATE, USAGE ON SCHEMA <schema> TO <user>;

Ownership of the original table and tables with foreign keys pointing to the original table:
```
ALTER TABLE <table> OWNER TO <user>;
```
REFERENCES privilege on tables the original table has foreign keys to:

GRANT REFERENCES ON TABLE <table> TO <user>;

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github		.github
docs		docs
requirements		requirements
src/psycopack		src/psycopack
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
makefile		makefile
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Psycopack

Basics

Python Interface

Example: Performing schema changes

Example: Converting the primary key to bigint

Example: Repacking (or clustering)

Example: Partitioning by date range

Known Limitations

Limitations when partitioning:

Required user permissions (or privileges)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Psycopack

Basics

Python Interface

Example: Performing schema changes

Example: Converting the primary key to bigint

Example: Repacking (or clustering)

Example: Partitioning by date range

Known Limitations

Limitations when partitioning:

Required user permissions (or privileges)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages