Skip to content

Latest commit

 

History

History
297 lines (263 loc) · 9.57 KB

File metadata and controls

297 lines (263 loc) · 9.57 KB

pgmold Architecture

Overview

pgmold is a PostgreSQL schema-as-code tool built in Rust. It follows a pipeline architecture where schemas flow through parsing, normalization, diffing, planning, and execution stages.

Core Principles

  1. Canonical Model is Truth: All operations use the normalized model::Schema IR. No module compares SQL to DB directly.
  2. Deterministic Output: BTreeMap everywhere. Sorted collections. Predictable diffs.
  3. Strict Module Boundaries: No SQL outside pg/sqlgen.rs. No DB access outside pg/.
  4. Fail Fast: No panics. Clear errors via anyhow::Result.

Module Structure

pgmold/
├── src/
│   ├── cli/           # CLI argument parsing, command routing
│   ├── parser/        # PostgreSQL DDL parser → canonical model
│   │   ├── mod.rs         # SQL parsing with sqlparser
│   │   └── loader.rs      # Multi-file schema loading
│   ├── model/         # Canonical schema IR (the core)
│   ├── pg/
│   │   ├── connection.rs   # Database connection pool
│   │   ├── introspect.rs   # DB → canonical model
│   │   └── sqlgen.rs       # Migration ops → SQL
│   ├── diff/
│   │   ├── mod.rs          # Schema comparison
│   │   └── planner.rs      # Operation ordering
│   ├── filter/        # Object filtering by name patterns and types
│   ├── lint/          # Safety rules
│   │   ├── mod.rs          # Lint rules and severity
│   │   └── locks.rs        # Lock hazard detection
│   ├── drift/         # Drift detection via fingerprinting
│   ├── baseline/      # Schema export with round-trip validation
│   ├── dump.rs        # Schema → SQL DDL generation
│   ├── migrate.rs     # Migration file numbering utilities
│   ├── apply/         # Transactional execution
│   ├── util/          # Shared types, errors
│   └── main.rs
└── tests/
    ├── integration.rs      # testcontainers tests
    ├── baseline.rs         # Baseline command tests
    └── semantic_equivalence.rs  # Normalization tests

Data Flow

┌─────────────┐     ┌─────────────┐
│  SQL File   │     │  PostgreSQL │
└──────┬──────┘     └──────┬──────┘
       │                   │
       ▼                   ▼
┌─────────────┐     ┌─────────────┐
│parser::parse│     │pg::introspect│
└──────┬──────┘     └──────┬──────┘
       │                   │
       └────────┬──────────┘
                │
                ▼
        ┌───────────────┐
        │ model::Schema │  ← Canonical IR
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │filter::filter │  ← Apply include/exclude patterns
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │ diff::compute │
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │  MigrationOp  │  ← Operations list
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │ diff::planner │  ← Order operations
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │  lint::check  │  ← Safety validation
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │ pg::sqlgen    │  ← Generate SQL
        └───────┬───────┘
                │
                ▼
        ┌───────────────┐
        │ apply::exec   │  ← Execute in transaction
        └───────────────┘

Canonical Model (model/)

The canonical IR represents all schema objects in a normalized form:

pub struct Schema {
    pub tables: BTreeMap<String, Table>,
    pub enums: BTreeMap<String, EnumType>,
    pub domains: BTreeMap<String, Domain>,
    pub extensions: BTreeMap<String, Extension>,
    pub functions: BTreeMap<String, Function>,
    pub views: BTreeMap<String, View>,
    pub triggers: BTreeMap<String, Trigger>,
    pub sequences: BTreeMap<String, Sequence>,
    pub partitions: BTreeMap<String, Partition>,
}

pub struct Table {
    pub name: String,
    pub schema: String,
    pub columns: BTreeMap<String, Column>,
    pub indexes: BTreeMap<String, Index>,
    pub primary_key: Option<PrimaryKey>,
    pub foreign_keys: BTreeMap<String, ForeignKey>,
    pub check_constraints: BTreeMap<String, CheckConstraint>,
    pub policies: BTreeMap<String, Policy>,
    pub rls_enabled: bool,
    pub rls_force: bool,
    pub partition_key: Option<PartitionKey>,
}

pub struct Column {
    pub name: String,
    pub data_type: PgType,
    pub nullable: bool,
    pub default: Option<String>,
    pub identity: Option<String>,
}

Key Design Decisions:

  • BTreeMap for deterministic iteration order
  • Map keys use qualified names: schema.name
  • All objects have a schema field (default: "public")
  • Fingerprinting via SHA256 of JSON serialization

Migration Operations

Operations represent atomic schema changes:

pub enum MigrationOp {
    CreateExtension(Extension),
    DropExtension(String),
    CreateEnum(EnumType),
    DropEnum(String, String),
    AddEnumValue { ... },
    CreateDomain(Domain),
    DropDomain(String, String),
    AlterDomain { ... },
    CreateTable(Table),
    DropTable(String, String),
    CreatePartition(Partition),
    DropPartition(String, String),
    AddColumn { ... },
    DropColumn { ... },
    AlterColumn { ... },
    AddPrimaryKey { ... },
    DropPrimaryKey { ... },
    AddIndex { ... },
    DropIndex { ... },
    AddForeignKey { ... },
    DropForeignKey { ... },
    AddCheckConstraint { ... },
    DropCheckConstraint { ... },
    EnableRls { ... },
    DisableRls { ... },
    ForceRls { ... },
    NoForceRls { ... },
    CreatePolicy(Policy),
    AlterPolicy { ... },
    DropPolicy { ... },
    CreateFunction(Function),
    DropFunction { ... },
    ReplaceFunction(Function),
    CreateView(View),
    DropView { ... },
    ReplaceView(View),
    CreateTrigger(Trigger),
    DropTrigger { ... },
    AlterTriggerEnabled { ... },
    CreateSequence(Sequence),
    DropSequence { ... },
    AlterSequence { ... },
}

Operation Ordering

The planner orders operations to satisfy dependencies:

  1. Create phase (safe to add):

    • CreateExtension
    • CreateEnum, AddEnumValue
    • CreateDomain
    • CreateSequence
    • CreateTable (topologically sorted by FK dependencies)
    • CreatePartition
    • AddColumn, AlterColumn
    • AddPrimaryKey
    • AddIndex
    • AddForeignKey
    • AddCheckConstraint
    • EnableRls, ForceRls
    • CreatePolicy, AlterPolicy
    • CreateFunction, ReplaceFunction
    • CreateView, ReplaceView
    • CreateTrigger
  2. Drop phase (reverse order):

    • DropTrigger
    • DropView
    • DropFunction
    • DropPolicy
    • DisableRls, NoForceRls
    • DropCheckConstraint
    • DropForeignKey
    • DropIndex
    • DropPrimaryKey
    • DropColumn
    • DropPartition
    • DropTable
    • DropSequence
    • DropDomain
    • DropEnum
    • DropExtension

Object Filtering

The filter module supports filtering by:

  • Name patterns (glob syntax: *, ?)
  • Object types (tables, indexes, policies, etc.)

Filters apply to both source and target schemas before diffing.

Lint Rules

Rule Severity Condition
deny_drop_column Error Without --allow-destructive
deny_drop_table Error Without --allow-destructive
deny_drop_enum Error Without --allow-destructive
deny_drop_table_in_prod Error When PGMOLD_PROD=1
warn_type_narrowing Warning Type change may lose data
warn_set_not_null Warning May fail on existing NULLs

Lock hazard detection warns about operations that acquire exclusive locks.

Module Dependencies

cli → parser, pg, diff, filter, lint, drift, baseline, dump, migrate, apply
parser → model
pg/introspect → model
pg/sqlgen → model, diff
diff → model
filter → model
lint → diff
drift → model
baseline → parser, pg, diff, dump
dump → model, pg/sqlgen
apply → pg

No circular dependencies. model is the leaf dependency.

Testing Strategy

  • Unit tests: Each module has inline #[cfg(test)] modules
  • Integration tests: Full pipeline with testcontainers PostgreSQL
  • Semantic equivalence tests: Verify normalization produces identical results

Supported PostgreSQL Features

  • Tables, columns, partitioned tables
  • Primary keys, foreign keys, check constraints
  • Indexes (btree, hash, gin, gist, brin)
  • Enums, domains
  • Functions (with volatility, security, SET parameters)
  • Views
  • Triggers (with WHEN clauses, transition tables)
  • Sequences (with SERIAL/BIGSERIAL support)
  • Row-Level Security (RLS) policies
  • Extensions
  • Multi-schema support