feat: add Concrete Table Inheritance as alternative inheritance strategy by joamag · Pull Request #26 · hivesolutions/colony-plugins

joamag · 2026-03-30T09:47:18Z

Summary

Introduces Concrete Table Inheritance (concrete_table) as an alternative to the existing Class Table Inheritance (class_table) strategy
Each concrete class stores all attributes (own + inherited) in its own table, and so does every ancestor class — enabling polymorphic queries at any hierarchy level without joins
Writing an entity inserts/updates/deletes rows in all ancestor tables (data duplication), while reading is always a single-table scan
Strategy is selectable per hierarchy via inheritance = "concrete_table" class attribute, fully backward compatible
Includes a standalone migration script (scripts/migrate_inheritance.py) for converting existing hierarchies between strategies
Design document: COP-002

Schema design

For ConcreteRootEntity -> ConcretePerson -> ConcreteEmployee:

_concrete_root_entity: object_id, status, _class, _mtime         (ALL entities)
_concrete_person:      object_id, status, name, age, _class, _mtime  (persons + employees)
_concrete_employee:    object_id, status, name, age, salary, _class, _mtime (employees only)

Core changes

structures.py: Added inheritance attribute, get_inheritance_strategy(), is_concrete_table(), and get_all_items() to EntityClass
system.py: Save/update/delete write to all ancestor tables; select reads from single table without joins; create builds flattened tables at every hierarchy level
mocks.py: Added ConcreteRootEntity, ConcretePerson, ConcreteEmployee, ConcreteAddress test entities
test.py: 77 tests covering base ORM, concrete table (including polymorphic queries), migration tooling, and backup
scripts/migrate_inheritance.py: Migration script with backup, validation, dry-run, and transactional execution
scripts/benchmark_inheritance.py: Performance benchmark comparing both strategies
doc/design/002-concrete_table_inheritance.md: Design document (COP-002)

Test plan

Closes #25

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added Concrete Table Inheritance mode with runtime strategy selection, migration utility, and benchmarking tool
- Improved REST request connection address resolution (proxy headers, IPv6-mapped IPv4 cleanup)
Changed
- Model logging methods now accept extra args/kwargs for lazy evaluation
Tests
- Added extensive unit and end-to-end tests for inheritance, migration, validation, backups, and rollback
Documentation
- New design doc and changelog/index updates
Chores
- CI workflow extended to run new tests

Introduce a new concrete_table inheritance strategy that coexists with the existing class_table (joined table) approach. When an entity hierarchy opts in via `inheritance = "concrete_table"`, each concrete class produces a single table containing all attributes (own + inherited), eliminating joins on read. Changes: - Add `inheritance` class attribute, `get_inheritance_strategy()`, and `is_concrete_table()` to EntityClass in structures.py - Add `get_all_items()` method for flattened attribute retrieval - Branch _create_definition_query, _save_query, _update_query, _remove_query, and _find_query on inheritance strategy - Skip parent table creation/joins for concrete_table entities - Add migration script (scripts/migrate_inheritance.py) with backup, validation, dry-run, and transactional execution - Add test entities and unit tests for the new strategy Closes #25 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-30T09:47:27Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7dcb16ab-ece8-41fb-8cc9-0cc5fa99b44f

📥 Commits

Reviewing files that changed from the base of the PR and between f8e030e and 5c3626e.

📒 Files selected for processing (2)

CHANGELOG.md
mvc/src/mvc_utils/model.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

📝 Walkthrough

Walkthrough

Adds selectable Concrete Table Inheritance (per-class concrete tables) with strategy selection on EntityClass, flattened item resolution, query and schema generation updates for concrete-mode CRUD and reads, new migration and benchmark CLI scripts for converting/comparing strategies, expanded concrete-table mocks and extensive tests, plus design docs and changelog updates.

Changes

Cohort / File(s)	Summary
Core Inheritance Infrastructure `data/src/entity_manager/structures.py`	Added `DATA_INHERITANCE` override, `inheritance = "class_table"` class attribute, cached internals, and classmethods `get_inheritance_strategy()`, `is_concrete_table()`, and `get_all_items()` for flattened attribute maps and strategy resolution.
SQL Query & Schema Generation `data/src/entity_manager/system.py`	Branched DataEntityManager internals to support concrete-table mode: CREATE/INSERT/UPDATE/DELETE and SELECT now use flattened item maps for concrete classes, skip parent joins for reads, adjust discriminator (`_class`) emission and table-name resolution, and change per-operation query composition accordingly.
Test Mocks `data/src/entity_manager/mocks.py`	Added concrete-table mock entities: `ConcreteRootEntity`, `ConcretePerson`, `ConcreteEmployee`, `ConcreteAddress` with initializers and relation mappers to exercise concrete inheritance in tests.
Test Suite Expansion `data/src/entity_manager/test.py`	Implemented and expanded many EntityManager tests, added `EntityManagerConcreteTableTestCase`, and added coverage for CRUD, relations, polymorphism, pagination, validation, and concrete-table behaviors.
Migration & Benchmark Tools `scripts/migrate_inheritance.py`, `scripts/benchmark_inheritance.py`	Added migration CLI to convert hierarchies between `class_table` and `concrete_table` (validation, dry-run, backup, transactional execution) and a benchmark script comparing CTI vs concrete-table strategies with narrow/wide schemas.
Migration Tests `scripts/test/test_migrate_inheritance.py`	Added unit tests for migration logic: hierarchy validation, generated SQL inspection (both directions), engine-specific SQL helpers, backup behavior, dry-run, rollback handling, and end-to-end schema/data migration checks.
Design & Docs `doc/design/002-concrete_table_inheritance.md`, `doc/design/README.md`, `CHANGELOG.md`	Added COP-002 design doc describing concrete-table semantics and global override, updated design index, and documented feature/migration additions in CHANGELOG.
CI Update `.github/workflows/main.yml`	Added pytest step to run `pytest scripts/test/ -v` in multiple jobs and adjusted Python-version gating for black checks.
Logging API Update `mvc/src/mvc_utils/model.py`	Extended model logging methods (`debug/info/warning/error/critical` and class-level variants) to accept `args, *kwargs` and forward them to the plugin logger for lazy evaluation support.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant EM as EntityManager
    participant DB as Database

    rect rgba(100, 150, 255, 0.5)
    Note over App,EM: Class Table Inheritance (existing)
    App->>EM: save(Employee)
    EM->>EM: split attrs across ancestor tables
    EM->>DB: INSERT parent table(s)
    DB-->>EM: OK
    EM->>DB: INSERT leaf table
    DB-->>EM: OK
    App->>EM: find(Employee, id)
    EM->>DB: SELECT leaf JOIN parent ... WHERE id=...
    DB-->>EM: joined row set
    end

    rect rgba(100, 255, 150, 0.5)
    Note over App,EM: Concrete Table Inheritance (new)
    App->>EM: save(EmployeeConcrete)
    EM->>EM: flatten inherited + own attrs
    EM->>DB: INSERT INTO employee (all_cols..., _class)
    DB-->>EM: OK
    App->>EM: find(EmployeeConcrete, id)
    EM->>DB: SELECT * FROM employee WHERE id=...
    DB-->>EM: single row
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble rows and hop through schemes,

Concrete tables cradle all my dreams.
Fewer joins to bog my feet—
Each class a meadow, tidy and neat.
Backups hum; migrations keep me warm.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding Concrete Table Inheritance as a new inheritance strategy option for entity hierarchies.
Linked Issues check	✅ Passed	All linked issue objectives are addressed: inheritance strategy selector added [`#25-1`], flattened item API implemented [`#25-2`], query generation branches by strategy [`#25-3-8`], relations/engines handled [`#25-9-10`], migration script provided [`#25-11`], test entities added [`#25-12`], comprehensive unit/migration/regression tests [`#25-13-15`], and CHANGELOG documented [`#25-16`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to concrete-table inheritance feature: structures (strategy/helpers), system (SQL branching), mocks (test entities), test coverage, migration tooling, benchmark/design docs, logging signature updates (supporting lazy eval), CI workflow updates (pytest), and CHANGELOG entry. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/concrete-table-inheritance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Fill in previously stubbed tests (test_self_relation, test_database_integrity, test_invalid_type, test_map, test_range) and add 15 new edge case tests: - test_get_nonexistent: retrieving non-existent entity returns None - test_find_empty: find on empty table returns empty list - test_find_filters: equals, greater, lesser, and combined filters - test_count_with_filters: count with filter conditions - test_update_nullify_field: setting a field to None - test_update_relation_change: changing a FK relation owner - test_update_nullify_relation: removing a FK relation - test_multiple_inheritance: Employee with Person+Loggable+Taxable - test_polymorphism_employee: polymorphic find across hierarchy - test_save_update_remove_cycle: full CRUD lifecycle - test_file_data_type: File entity with data type field - test_unicode_fields: unicode characters in text fields - test_find_with_set: result set mode retrieval - test_breeder_subclass: Breeder/BreedDog subclass relations - test_remove_with_relations: removing entities with FK relations Also fixes concrete table _class column qualifier in SELECT queries to use the entity's own table name instead of the top parent table, and fixes _process_table_name to resolve all attributes to the entity's own table for concrete table inheritance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add EntityManagerMigrationTestCase with 11 tests covering the migration script functionality: - test_validate_same_strategy: rejects migration to current strategy - test_validate_concrete_same_strategy: same for concrete table - test_validate_cti_to_concrete: validates CTI->Concrete migration - test_validate_concrete_to_cti: validates Concrete->CTI migration - test_dry_run_cti_to_concrete: dry-run generates SQL without changes - test_dry_run_concrete_to_cti: same for reverse direction - test_backup_sqlite: creates backup file before migration - test_migrate_cti_to_concrete_schema: verifies CTI table structure - test_migrate_rollback_on_failure: simulated failure triggers rollback - test_generate_queries_cti_to_concrete: verifies CREATE/INSERT/DROP/ALTER - test_generate_queries_concrete_to_cti: same for reverse direction Also moves inline imports to top-level per project conventions and uses importlib for robust migration module loading across Colony's plugin cache directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ests Expand migration backup test coverage with three new tests: - test_backup_data_integrity: opens the backup as a separate SQLite connection and verifies table structure and row data match the original database (tables exist, column values preserved) - test_migrate_creates_backup: verifies that migrate() creates a backup file when skip_backup=False, using a failing connection to isolate the backup step from the actual migration SQL - test_migrate_aborts_on_backup_failure: verifies that migrate() returns failure with appropriate message when backup fails Also adds backup naming convention assertion (.backup.timestamp) to the existing test_backup_sqlite test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Guard `import importlib.util` with try/except for Python 2.7 compatibility (PyPy 2.7 and CPython 2.7 do not have importlib.util) - Add `import sqlite3` at module level instead of inline - Fix test_backup_data_integrity for Windows: close backup_conn in the finally block before os.remove() to release file handle - Add WAL checkpoint before backup to ensure data is flushed to the main database file on Windows (WAL mode may leave data in the WAL file which shutil.copy2 does not capture) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ilable The migration tests use importlib.util for dynamic module loading which is not available in Python 2.7. Instead of erroring, the tests now gracefully skip via unittest.skipTest when running on Python 2.7. - Add HAS_IMPORTLIB_UTIL flag set during import - Add setUp override in EntityManagerMigrationTestCase that calls skipTest when importlib.util is not available - Guard _load_migration_module to return None on Python 2.7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add scripts/benchmark_inheritance.py that compares Class Table Inheritance vs Concrete Table Inheritance performance across all CRUD operations using SQLite. The script measures: schema creation, bulk insert, select by ID, select all, count, bulk update, and bulk delete - printing a formatted report with speedup ratios. Usage: python scripts/benchmark_inheritance.py [iterations] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a second benchmark section with 20 columns per level (60 total) to demonstrate that concrete table's write advantage narrows with wider rows and its read advantage can reverse entirely when the single-table row becomes too wide for SQLite's page format. Narrow (8 cols): concrete table ~2x faster on writes, ~same on reads Wide (60 cols): concrete table ~1.1x on writes, can be slower on reads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revise the concrete table inheritance strategy so that every concrete class in the hierarchy gets its own table containing all fields flattened down to that hierarchy level. Writing an entity inserts/updates/deletes rows in every ancestor table, enabling polymorphic queries at any hierarchy level without joins. Schema example (root -> person -> employee): - _concrete_root_entity: object_id, status, _class, _mtime - _concrete_person: object_id, status, name, age, _class, _mtime - _concrete_employee: object_id, status, name, age, salary, _class, _mtime Changes: - create(): remove parent table skip, create all ancestor tables with flattened columns via get_all_items() - _save_query: build items_map with one entry per ancestor class, each using that class's get_all_items() - _update_query: same ancestor-level items_map approach - _remove_query: delete from all ancestor tables (same as CTI) - Add test_polymorphic_query verifying queries at root, person, and employee levels return correct entity counts - Add design document COP-002 describing the schema, query behavior, implementation details, and trade-offs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complete rewrite of scripts/migrate_inheritance.py: - Uses ORM's own entity class methods (get_name, get_all_items, get_names_map, get_id) instead of reimplementing name resolution - Handles multiple inheritance (Employee -> Person + Loggable + Taxable) with duplicate detection in hierarchy traversal - Filters hierarchy by table existence when connection is available, so only tables present in the data source are migrated - Creates flat tables at every hierarchy level for CTI->Concrete (matching the new ancestor-population design) - Uses INSERT OR IGNORE for Concrete->CTI to handle duplicates when multiple concrete tables map to the same CTI level - Recreates all indexes: PK (hash+btree), _mtime (hash+btree), and FK indexes on mapped relation columns - Adds validate_data() for pre-migration referential integrity checks (orphaned row detection) End-to-end migration tests: - test_end_to_end_cti_to_concrete: creates CTI tables via ORM, inserts data, migrates, verifies schema changes and data integrity at every hierarchy level including polymorphic rows - test_end_to_end_concrete_to_cti: creates concrete tables via ORM, inserts data, migrates, verifies CTI table structure and column distribution - test_migration_preserves_indexes: verifies index count on migrated tables Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a DATA_INHERITANCE environment variable that overrides the class-level inheritance attribute for all entity hierarchies, enabling system-wide strategy switching for migration debugging without modifying entity code. Resolution order in get_inheritance_strategy(): 1. Cached _inheritance_strategy (per-class, avoids repeated lookups) 2. colony.conf("DATA_INHERITANCE") with module-level fallback 3. Class-level inheritance attribute 4. Parent hierarchy traversal 5. Default "class_table" Usage: export DATA_INHERITANCE=concrete_table # force all entities unset DATA_INHERITANCE # use class attributes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47dbdbb128

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-30T14:53:11Z

data/src/entity_manager/system.py

+        # are skipped as all columns reside in the entity's own
+        # table, for class table inheritance the joins are required
+        # to reconstruct the full entity from multiple tables
+        if not is_concrete:


Preserve parent aliases for inherited concrete relations

Skipping parent-table joins for every concrete-table query leaves inherited relation joins pointing at aliases that were never introduced. In _join_query_f, relations declared on a parent class still build conditions like _parent_table.<relation> via _prefix, but with this guard those parent aliases are absent, so eager-loading a relation inherited by a concrete subclass (e.g., querying ConcreteEmployee with a relation from ConcretePerson) produces invalid SQL at runtime.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-30T14:53:11Z

scripts/migrate_inheritance.py

+                "select count(*) from sqlite_master where type='table' and name=?",
+                (cls.get_name(),),


Use non-SQLite metadata queries for non-SQLite engines

The migration query generators always probe table existence with sqlite_master when a connection is provided. migrate() passes the live connection through, so using MySQL/PostgreSQL connections will fail before migration starts because those engines do not expose sqlite_master. This breaks the advertised non-SQLite migration path in migrate().

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-30T14:53:11Z

scripts/migrate_inheritance.py

+                "insert or ignore into %s(%s) select %s from %s "
+                "where _class = '%s'"


Avoid SQLite-only INSERT syntax in cross-engine migration

The concrete→CTI path emits insert or ignore, which is SQLite-specific SQL. When migrate() is used with MySQL/PostgreSQL (the function exposes those engines), this statement is invalid and aborts the migration. Use an engine-specific conflict strategy (or generic inserts plus dedup logic) instead of hardcoding SQLite syntax.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Adds a new ORM inheritance strategy (“concrete table inheritance”) to complement the existing class-table inheritance, plus tooling/docs/tests to support strategy selection and migration.

Changes:

Introduces inheritance = "concrete_table" on entity hierarchies, with strategy resolution (get_inheritance_strategy(), is_concrete_table()) and flattened field resolution (get_all_items()).
Updates entity manager SQL generation to create flattened tables and to read from a single table (no parent joins) while writing to all ancestor tables.
Adds migration + benchmark scripts, design documentation (COP-002), changelog entry, and expanded test coverage (including end-to-end migration tests).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`data/src/entity_manager/structures.py`	Adds strategy selector + cache, concrete-table helpers, and `get_all_items()` for flattened field maps.
`data/src/entity_manager/system.py`	Branches table creation / save / update / find query-building behavior based on inheritance strategy.
`data/src/entity_manager/mocks.py`	Adds concrete-table mock entities mirroring existing inheritance test hierarchies.
`data/src/entity_manager/test.py`	Adds concrete-table tests plus migration/backup tests and fills in previously-stubbed test cases.
`scripts/migrate_inheritance.py`	New standalone migration tool for converting hierarchies between strategies (backup/dry-run/transactional).
`scripts/benchmark_inheritance.py`	New benchmark script comparing CTI vs concrete-table operations on SQLite.
`doc/design/README.md`	Adds COP-002 to design doc index.
`doc/design/002-concrete_table_inheritance.md`	Design document describing schema/query behavior, tradeoffs, and configuration.
`CHANGELOG.md`	Documents the new inheritance strategy, migration tooling, and tests under Added.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T14:56:00Z

data/src/entity_manager/system.py

+        # table inheritance strategy, in which case a single
+        # insert query is generated for all attributes


The comment says concrete-table mode generates “a single insert query”, but the implementation builds an items_map containing all ancestor tables and then iterates it, producing multiple INSERTs (one per ancestor table). Please update the comment to reflect the actual behavior so future maintainers don’t get misled.

Suggested change

# table inheritance strategy, in which case a single

# insert query is generated for all attributes

# table inheritance strategy, in which case an insert

# query is generated for each concrete/ancestor table

Copilot · 2026-03-30T14:56:00Z

doc/design/002-concrete_table_inheritance.md

+unset DATA_INHERITANCE
+```
+
+The override is resolved via `colony.conf("DATA_INHERITANCE", None)` at module load time in `structures.py` and checked as the first step in `get_inheritance_strategy()`.


The design doc says the global override is “checked at module load time”, but structures.py resolves the override inside get_inheritance_strategy() via colony.conf(...) and caches the result per class. Please update this section to reflect the actual runtime + caching behavior (and how to clear the cache when switching overrides).

Suggested change

The override is resolved via `colony.conf("DATA_INHERITANCE", None)` at module load time in `structures.py` and checked as the first step in `get_inheritance_strategy()`.

The override is resolved via `colony.conf("DATA_INHERITANCE", None)` inside `get_inheritance_strategy()` at runtime. The effective inheritance strategy is cached per entity class, so changes to `DATA_INHERITANCE` in a long-running process only take effect after clearing the per-class inheritance-strategy cache (or restarting the process) and then reloading/creating the relevant entities.

Copilot · 2026-03-30T14:56:01Z

scripts/benchmark_inheritance.py

+SQL_TYPES_MAP = dict(
+    text="text",
+    string="varchar(255)",
+    integer="integer",
+    long="bigint",
+    float="double precision",
+    decimal="double precision",
+    date="double precision",
+    data="text",
+    metadata="text",
+)
+""" The default SQL types map for column type resolution """
+
+


SQL_TYPES_MAP is defined but never used in this benchmark script. Consider removing it (or wiring it into schema creation) to avoid dead code and keep the benchmark focused on the operations being measured.

Suggested change

SQL_TYPES_MAP = dict(

text="text",

string="varchar(255)",

integer="integer",

long="bigint",

float="double precision",

decimal="double precision",

date="double precision",

data="text",

metadata="text",

)

""" The default SQL types map for column type resolution """

Copilot · 2026-03-30T14:56:01Z

scripts/migrate_inheritance.py

+    if connection:
+        _hierarchy = []
+        for cls in hierarchy:
+            cursor = connection.cursor()
+            cursor.execute(
+                "select count(*) from sqlite_master where type='table' and name=?",
+                (cls.get_name(),),
+            )
+            exists = cursor.fetchone()[0] > 0
+            cursor.close()


migrate()/query generation is written with SQLite-specific assumptions (e.g., querying sqlite_master for table existence) even when engine is set to mysql/pgsql. This makes the advertised non-SQLite support unreliable; consider either (1) explicitly restricting the migration module to SQLite-only, or (2) branching table-existence detection and SQL generation per engine (and using the correct INSERT/DDL dialect).

Copilot · 2026-03-30T14:56:01Z

scripts/migrate_inheritance.py

+    if not is_valid:
+        return False, messages
+
+    if validate_only:


validate_data() implements the orphan-row / referential consistency checks described in the validation docstrings, but it is never called by migrate() (including --validate). As a result, validation currently only checks strategy configuration and ID presence, not data integrity; either invoke validate_data() during validation, or adjust the validation messaging/docs to match actual behavior.

Suggested change

if validate_only:

if validate_only:

# performs data-level validation (eg orphan rows, referential integrity)

# without executing any migration queries

is_data_valid, data_messages = validate_data(

entity_class,

connection,

connection_params,

)

messages.extend(data_messages)

if not is_data_valid:

return False, messages

Copilot · 2026-03-30T14:56:01Z

scripts/migrate_inheritance.py

+    try:
+        cursor = connection.cursor()
+
+        for query in queries:
+            messages.append("executing: %s" % query)
+            cursor.execute(query)
+
+        connection.commit()


The migration executes via a cursor created at the start of the transaction but never closes it. Please ensure the cursor is closed in a finally block (success and failure paths) to avoid leaking resources across repeated runs/tests.

Copilot · 2026-03-30T14:56:01Z

scripts/migrate_inheritance.py

+__license__ = "Apache License, Version 2.0"
+""" The license for the module """
+
+import os


Unused import: os is imported but not used anywhere in this script. Removing it avoids lint noise and keeps the migration module minimal.

Suggested change

import os

Copilot · 2026-03-30T14:56:02Z

data/src/entity_manager/system.py

+        # table inheritance strategy, in which case a single
+        # update query is generated for all attributes


Similarly, this comment says concrete-table mode generates “a single update query”, but the code constructs an items_map across ancestors and generates multiple UPDATE statements. Please align the comment with the real multi-table update behavior in concrete-table mode.

Suggested change

# table inheritance strategy, in which case a single

# update query is generated for all attributes

# table inheritance strategy, in which case coordinated

# update statements are generated across the concrete tables

Copilot · 2026-03-30T14:56:02Z

scripts/migrate_inheritance.py

+            insert_query = (
+                "insert or ignore into %s(%s) select %s from %s "
+                "where _class = '%s'"
+                % (
+                    new_table_name,
+                    ", ".join(select_cols),
+                    ", ".join(select_cols),
+                    concrete_table,
+                    cls.__name__,
+                )
+            )
+            queries.append(insert_query)


generate_concrete_to_cti_queries() uses insert or ignore, which is SQLite syntax. If migrate() is intended to be usable with MySQL/PostgreSQL connections (as implied by the engine parameter and backup support), this should be generated per-engine (e.g., INSERT IGNORE for MySQL, ON CONFLICT DO NOTHING for PostgreSQL) or the script should fail fast when engine != sqlite.

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (2)

data/src/entity_manager/test.py (2)
3308-3549: Exercise the ORM after each migration, not just raw SQLite reads.

These end-to-end cases stop at schema and row-level assertions. That won't catch a migration that produces the right tables but still breaks EntityManager.get/find/save against the migrated strategy. Reopening the entity manager and doing one ORM round-trip here would cover the integration boundary this PR is changing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/test.py` around lines 3308 - 3549, Add ORM round-trip
checks after each migration in test_end_to_end_cti_to_concrete and
test_end_to_end_concrete_to_cti: after calling migration.migrate and asserting
success, reinitialize or reopen the EntityManager (or at least clear its
session), then use EntityManager.get/find/save (and entity_manager.begin/commit
where needed) to load an existing migrated entity and to persist a new one to
verify full ORM operations work against the migrated schema; reference the
existing helpers used in the test (self.entity_manager, self._get_connection(),
and migration.migrate) to locate where to insert these checks.
2642-2713: Move the inheritance override state management into setUp/tearDown.

This test mutates structures.DATA_INHERITANCE and per-class caches inline. The try/finally helps, but the suite guideline for class-level test state is to capture it in setUp and restore it in tearDown so isolation is uniform across the module.

As per coding guidelines, "For class-level state in tests, save the original state in setUp and restore it in tearDown to ensure test isolation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/test.py` around lines 2642 - 2713, The test mutates
module-global structures.DATA_INHERITANCE and per-class _inheritance_strategy
caches inline; move that state management into the test class's setUp and
tearDown: in setUp capture structures.DATA_INHERITANCE into
self._original_data_inheritance, build the test_classes list and save any
existing cls._inheritance_strategy into self._saved_caches, and clear caches
(del cls._inheritance_strategy) so each test starts fresh; in tearDown restore
structures.DATA_INHERITANCE from self._original_data_inheritance, clear any
temporary _inheritance_strategy entries, and reassign saved values from
self._saved_caches; then simplify test_global_inheritance_override by removing
the try/finally and relying on setUp/tearDown to manage state.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/src/entity_manager/mocks.py`:
- Around line 539-543: _relation_boss currently sets reverse="children", which
pairs the boss relation to the parent/child metadata; change the reverse to a
dedicated reports collection (e.g., reverse="reports") so boss lookups and lazy
loading traverse the correct relation on ConcretePerson, and ensure
ConcretePerson defines a matching reports relation name; update the dict
returned by _relation_boss to use reverse="reports" (or the actual reports
relation name on ConcretePerson).

In `@data/src/entity_manager/structures.py`:
- Around line 147-151: The code currently reserves the public attribute name
"inheritance" (listed alongside "_inheritance_strategy" in the
INVALID_NAMES-like list), which breaks backwards compatibility; change the
behavior so the framework either uses a private sentinel name (e.g., keep and
use "_inheritance_strategy" as the config key and remove "inheritance" from the
invalid/reserved names) or make the reservation conditional by only treating
"inheritance" as the framework flag when its value equals one of the supported
strategy strings (check membership against your strategy constants) — update the
INVALID_NAMES array and any validation/filtering logic that references
"inheritance" accordingly (also apply the same change in the other occurrences
referenced in the review).
- Around line 1813-1875: get_all_items() currently merges parent and own items
by assigning into all_items such that later entries overwrite earlier ones
(last-wins), which can conflict with the ORM's owner precedence used by
get_names_map() that preserves the first owner; change the merge logic in
get_all_items() (the loop over cls.get_all_parents() and the subsequent
own_items loop that writes into all_items and sets setattr(cls, cache_key,
all_items)) so that on key collisions you preserve the existing owner/value
(first-wins) rather than overwriting, and ensure own_items still take precedence
over parents by only writing a key from own_items if it does not already exist
or by writing parents first then own items using a non-overwriting assign policy
consistent with get_names_map().

In `@data/src/entity_manager/system.py`:
- Around line 4673-4697: The joins for inherited relations still use the
ancestor class name as the SQL prefix, causing predicates to reference Parent.*
even after Parent is removed from FROM; update join_tables() so that when a
relation is declared on an ancestor it computes the SQL prefix from the
concrete/table-owning class (not _entity_class.get_name())—e.g. resolve the
concrete subclass/table name (or the concrete table alias mapping used when
building the FROM joins) and use that for _prefix and any predicate generation;
apply the same fix to the equivalent logic referenced around lines 4771-4780 so
inherited relations always target the concrete table alias rather than the
ancestor name.
- Around line 3063-3065: The concrete-table path flattens parent columns via
entity_class.get_all_items() but schema setup still only calls
index_fields(entity_class) which in turn uses entity_class.get_indexed(), so
inherited indexed fields on the concrete table are not registered; update the
indexing logic so that for concrete tables (when is_concrete and table_items
comes from entity_class.get_all_items()) you call the inherited-aware accessor
(e.g. entity_class.get_all_indexed() or otherwise merge parent indexed fields
into entity_class.get_indexed()) before calling index_fields(entity_class) so
the child's table carries secondary indexes defined on parents.
- Around line 5143-5148: The issue is that ordering resolution still uses
_resolve_name() which consults get_names_map() and returns parent-qualified
names for inherited concrete-table fields; update _resolve_name() (or add the
same early-return logic used in the filter branch) to check
entity_class.is_concrete_table() and return table_name or the entity's own name
(entity_class.get_name() / _own_table_name) for any resolved attribute path so
that _order_query_f() and eager relation paths do not produce Parent.field
aliases when the child uses concrete table inheritance; ensure this same change
covers any code paths where order_by or relation-paths call _resolve_name().

In `@doc/design/002-concrete_table_inheritance.md`:
- Around line 144-145: The doc line incorrectly states that DATA_INHERITANCE is
read at module load time; update the sentence to say that
`colony.conf("DATA_INHERITANCE", None)` is read inside
`get_inheritance_strategy()` at call time (not at import/module load), and
mention `get_inheritance_strategy()` in `structures.py` as the place the
override is checked so the documented lifecycle matches the code.

In `@scripts/benchmark_inheritance.py`:
- Around line 162-176: The benchmark helpers (e.g., insert_concrete) currently
only operate on the leaf concrete table, which underestimates real-world cost
because the real implementation duplicates writes into ancestor tables and
serves parent-level reads via UNION ALL; update insert_concrete and the
corresponding read helpers (the other functions mentioned around lines 239-252,
278-315, 429-510) to perform the same behavior as the real concrete-table
strategy: when inserting a leaf row, also insert into each ancestor table with
the appropriate columns and _class marker, and when running parent-level queries
use a UNION ALL over the relevant ancestor+leaf tables (or simulate that by
querying the ancestor tables combined) so the benchmark reflects the extra write
and read costs of duplicated ancestor rows and UNION ALL scans; locate and
modify functions named insert_concrete and the sibling insert/read helpers to
implement these changes.
- Around line 575-755: The cleanup currently deletes temporary DB files in the
outer finally without guaranteeing the SQLite connections (cti_conn,
concrete_conn, wide_cti_conn, wide_concrete_conn) are closed, which can fail on
Windows and leak wide DB files; modify the function to track these connection
objects and paths (cti_conn, concrete_conn, wide_cti_conn, wide_concrete_conn
and cti_path, concrete_path, wide_cti_path, wide_concrete_path), and in the
finally block first safely close any non-None/open connections (catch and ignore
errors on close) and then remove their corresponding files if they exist; ensure
wide_* variables are defined (init to None) before use so early exceptions still
trigger proper closure and deletion.

In `@scripts/migrate_inheritance.py`:
- Around line 707-725: The migrate() flow currently stops after
validate_hierarchy() and skips validate_data(), so run validate_data() after a
successful hierarchy validation and before proceeding to build queries (i.e.,
after the existing is_valid check and before the current_strategy logic); if
validate_data() returns failures, append its messages to messages and return
False, messages, and if validate_only is true return True, messages only when
both validate_hierarchy() and validate_data() succeed; ensure you call
validate_data(entity_class, connection, types_map) (or the appropriate
signature) and propagate its results before calling
generate_cti_to_concrete_queries() or generate_concrete_to_cti_queries().
- Around line 159-163: The backup path code for SQLite does not flush WAL to the
main DB before copying, so modify the sqlite branch (the block that checks if
engine == "sqlite" and uses connection_params/file_path) to open a sqlite3
connection to file_path (or use the existing connection if available) and
execute "PRAGMA wal_checkpoint(FULL)" (via cursor.execute) and commit/close
before calling shutil.copy2; then perform the existing backup_path creation and
copy as before so the backup includes WAL changes.
- Around line 338-345: The code uses SQLite-only SQL (sqlite_master and "insert
or ignore") inside generate_cti_to_concrete_queries and
generate_concrete_to_cti_queries and so will break for mysql/pgsql; update both
functions to accept an engine parameter and branch SQL generation by engine:
keep the sqlite_master query when engine=="sqlite", but use
information_schema.tables or pg_catalog (e.g. query information_schema.tables
with table_schema/name) for mysql and pgsql table-existence checks in place of
sqlite_master, and replace "insert or ignore" with engine-specific upsert syntax
("INSERT IGNORE" for mysql, "INSERT ... ON CONFLICT DO NOTHING" for pgsql) while
preserving behavior; ensure migrate() callers pass the engine through to these
functions and update places that use connection.cursor()/execute to use the
engine-specific SQL strings from generate_cti_to_concrete_queries /
generate_concrete_to_cti_queries.
- Around line 70-80: The current _collect recursion relies on __subclasses__()
which only reflects already-imported modules; before calling
_collect(entity_class) ensure all modules that could define subclasses of
entity_class are imported (e.g., use
pkgutil.walk_packages()/importlib.import_module over the package containing
entity_class.__module__ or otherwise import relevant modules), then run the
existing _collect function that appends to classes; keep the same symbols
(_collect, classes, entity_class, __subclasses__()) but add the module
discovery/import step prior to invoking _collect so the full class hierarchy is
loaded.

---

Nitpick comments:
In `@data/src/entity_manager/test.py`:
- Around line 3308-3549: Add ORM round-trip checks after each migration in
test_end_to_end_cti_to_concrete and test_end_to_end_concrete_to_cti: after
calling migration.migrate and asserting success, reinitialize or reopen the
EntityManager (or at least clear its session), then use
EntityManager.get/find/save (and entity_manager.begin/commit where needed) to
load an existing migrated entity and to persist a new one to verify full ORM
operations work against the migrated schema; reference the existing helpers used
in the test (self.entity_manager, self._get_connection(), and migration.migrate)
to locate where to insert these checks.
- Around line 2642-2713: The test mutates module-global
structures.DATA_INHERITANCE and per-class _inheritance_strategy caches inline;
move that state management into the test class's setUp and tearDown: in setUp
capture structures.DATA_INHERITANCE into self._original_data_inheritance, build
the test_classes list and save any existing cls._inheritance_strategy into
self._saved_caches, and clear caches (del cls._inheritance_strategy) so each
test starts fresh; in tearDown restore structures.DATA_INHERITANCE from
self._original_data_inheritance, clear any temporary _inheritance_strategy
entries, and reassign saved values from self._saved_caches; then simplify
test_global_inheritance_override by removing the try/finally and relying on
setUp/tearDown to manage state.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8caefd7-f2a1-47cf-91b3-d1b30aeb53db

📥 Commits

Reviewing files that changed from the base of the PR and between a34f2b6 and 47dbdbb.

📒 Files selected for processing (9)

CHANGELOG.md
data/src/entity_manager/mocks.py
data/src/entity_manager/structures.py
data/src/entity_manager/system.py
data/src/entity_manager/test.py
doc/design/002-concrete_table_inheritance.md
doc/design/README.md
scripts/benchmark_inheritance.py
scripts/migrate_inheritance.py

coderabbitai · 2026-03-30T15:10:22Z

data/src/entity_manager/mocks.py

+    @staticmethod
+    def _relation_boss():
+        return dict(
+            type="to-one", target=ConcretePerson, reverse="children", is_mapper=True
+        )


⚠️ Potential issue | 🟠 Major

boss is paired with the wrong reverse relation.

Using reverse="children" ties manager lookups to the parent/child relation metadata. ConcretePerson needs a dedicated report collection here; otherwise reverse traversal and lazy loading for boss relationships will resolve the wrong side.

🐛 Suggested fix

class ConcretePerson(ConcreteRootEntity): @@ children = dict(type="relation") """ The children of the current person """ + employees = dict(type="relation") + """ The employees associated with the person """ + @@ def _relation_children(): return dict(type="to-many", target=ConcretePerson, reverse="parent") + `@staticmethod` + def _relation_employees(): + return dict(type="to-many", target=ConcreteEmployee, reverse="boss") + @@ class ConcreteEmployee(ConcretePerson): @@ `@staticmethod` def _relation_boss(): return dict( - type="to-one", target=ConcretePerson, reverse="children", is_mapper=True + type="to-one", target=ConcretePerson, reverse="employees", is_mapper=True )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/src/entity_manager/mocks.py` around lines 539 - 543, _relation_boss currently sets reverse="children", which pairs the boss relation to the parent/child metadata; change the reverse to a dedicated reports collection (e.g., reverse="reports") so boss lookups and lazy loading traverse the correct relation on ConcretePerson, and ensure ConcretePerson defines a matching reports relation name; update the dict returned by _relation_boss to use reverse="reports" (or the actual reports relation name on ConcretePerson).

coderabbitai · 2026-03-30T15:10:22Z

data/src/entity_manager/structures.py

+        "_inheritance_strategy",
+        "_all_items",
+        "_all_non_foreign_items",
        "abstract",
+        "inheritance",


⚠️ Potential issue | 🟠 Major

inheritance is now a breaking reserved attribute.

Any model that already persisted an inheritance field will silently lose it after this change: the name is filtered out by INVALID_NAMES, and the same attribute is now interpreted as framework strategy config. For a public ORM surface, that is a backward-compatibility break, not just an additive feature. Please move the strategy flag to a private/sentinel name, or only reserve inheritance when its value is one of the supported strategy strings.

Also applies to: 429-433, 2638-2642

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/src/entity_manager/structures.py` around lines 147 - 151, The code currently reserves the public attribute name "inheritance" (listed alongside "_inheritance_strategy" in the INVALID_NAMES-like list), which breaks backwards compatibility; change the behavior so the framework either uses a private sentinel name (e.g., keep and use "_inheritance_strategy" as the config key and remove "inheritance" from the invalid/reserved names) or make the reservation conditional by only treating "inheritance" as the framework flag when its value equals one of the supported strategy strings (check membership against your strategy constants) — update the INVALID_NAMES array and any validation/filtering logic that references "inheritance" accordingly (also apply the same change in the other occurrences referenced in the review).

coderabbitai · 2026-03-30T15:10:22Z

data/src/entity_manager/structures.py

+        # retrieves all the parent classes and iterates over
+        # them to collect their items, this includes abstract
+        # parents whose items would normally be inherited into
+        # the first concrete parent
+        all_parents = cls.get_all_parents()
+        for parent in all_parents:
+            # retrieves the items directly defined in the parent
+            # class using the internal items method and filters
+            # them using the same criteria as get_items
+            parent_items = parent._items()
+            for key, value in colony.legacy.iteritems(parent_items):
+                if key in INVALID_NAMES:
+                    continue
+                if key.isupper():
+                    continue
+                if type(value) in (
+                    types.FunctionType,
+                    types.MethodType,
+                    staticmethod,
+                    classmethod,
+                    property,
+                ):
+                    continue
+                if (
+                    not foreign_relations
+                    and cls.is_relation(key)
+                    and not cls.is_mapped(key)
+                ):
+                    continue
+                if not hasattr(value, "get"):
+                    continue
+                all_items[key] = value
+
+        # retrieves the items directly defined in the current
+        # class and merges them into the all items map (these
+        # take precedence over parent items)
+        own_items = cls._items()
+        for key, value in colony.legacy.iteritems(own_items):
+            if key in INVALID_NAMES:
+                continue
+            if key.isupper():
+                continue
+            if type(value) in (
+                types.FunctionType,
+                types.MethodType,
+                staticmethod,
+                classmethod,
+                property,
+            ):
+                continue
+            if (
+                not foreign_relations
+                and cls.is_relation(key)
+                and not cls.is_mapped(key)
+            ):
+                continue
+            if not hasattr(value, "get"):
+                continue
+            all_items[key] = value
+
+        # caches the all items in the class to provide
+        # fast access in latter access
+        setattr(cls, cache_key, all_items)


⚠️ Potential issue | 🟠 Major

get_all_items() can disagree with the ORM's existing owner precedence.

This merge overwrites on every collision, so the last parent wins. get_names_map() keeps the first owner instead, which means flattened CTI metadata can diverge from the owner/validation path for duplicate fields in multi-parent hierarchies.

♻️ Suggested fix

- # retrieves all the parent classes and iterates over - # them to collect their items, this includes abstract - # parents whose items would normally be inherited into - # the first concrete parent - all_parents = cls.get_all_parents() - for parent in all_parents: - # retrieves the items directly defined in the parent - # class using the internal items method and filters - # them using the same criteria as get_items - parent_items = parent._items() - for key, value in colony.legacy.iteritems(parent_items): - if key in INVALID_NAMES: - continue - if key.isupper(): - continue - if type(value) in ( - types.FunctionType, - types.MethodType, - staticmethod, - classmethod, - property, - ): - continue - if ( - not foreign_relations - and cls.is_relation(key) - and not cls.is_mapped(key) - ): - continue - if not hasattr(value, "get"): - continue - all_items[key] = value - - # retrieves the items directly defined in the current - # class and merges them into the all items map (these - # take precedence over parent items) - own_items = cls._items() - for key, value in colony.legacy.iteritems(own_items): - if key in INVALID_NAMES: - continue - if key.isupper(): - continue - if type(value) in ( - types.FunctionType, - types.MethodType, - staticmethod, - classmethod, - property, - ): - continue - if ( - not foreign_relations - and cls.is_relation(key) - and not cls.is_mapped(key) - ): - continue - if not hasattr(value, "get"): - continue - all_items[key] = value + # reuse the same ownership order used by get_names_map() + # so flattened descriptors match the rest of the ORM + names_map = cls.get_names_map() + for key, owner in colony.legacy.iteritems(names_map): + owner_items = owner.get_items(foreign_relations=foreign_relations) + if key in owner_items: + all_items[key] = owner_items[key]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/src/entity_manager/structures.py` around lines 1813 - 1875, get_all_items() currently merges parent and own items by assigning into all_items such that later entries overwrite earlier ones (last-wins), which can conflict with the ORM's owner precedence used by get_names_map() that preserves the first owner; change the merge logic in get_all_items() (the loop over cls.get_all_parents() and the subsequent own_items loop that writes into all_items and sets setattr(cls, cache_key, all_items)) so that on key collisions you preserve the existing owner/value (first-wins) rather than overwriting, and ensure own_items still take precedence over parents by only writing a key from own_items if it does not already exist or by writing parents first then own items using a non-overwriting assign policy consistent with get_names_map().

coderabbitai · 2026-03-30T15:10:22Z

data/src/entity_manager/system.py

+        table_items = (
+            entity_class.get_all_items() if is_concrete else entity_class.get_items()
+        )


⚠️ Potential issue | 🟠 Major

Carry inherited secondary indexes onto concrete tables.

get_all_items() flattens parent columns into the concrete table here, but schema setup still calls index_fields(entity_class), and that method only indexes entity_class.get_indexed() for the current table. Under class-table that was table-scoped by necessity; with concrete-table, indexed parent fields on the child table are left unindexed, so inherited-field filters/orderings regress to table scans.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/src/entity_manager/system.py` around lines 3063 - 3065, The concrete-table path flattens parent columns via entity_class.get_all_items() but schema setup still only calls index_fields(entity_class) which in turn uses entity_class.get_indexed(), so inherited indexed fields on the concrete table are not registered; update the indexing logic so that for concrete tables (when is_concrete and table_items comes from entity_class.get_all_items()) you call the inherited-aware accessor (e.g. entity_class.get_all_indexed() or otherwise merge parent indexed fields into entity_class.get_indexed()) before calling index_fields(entity_class) so the child's table carries secondary indexes defined on parents.

coderabbitai · 2026-03-30T15:10:22Z

data/src/entity_manager/system.py

+        if not is_concrete:
+            # iterates over all the parents to provide
+            # the necessary (inner) join of them into
+            # the current query context, this is a main step
+            # in achieving inheritance compliance in the query
+            for parent in all_parents:
+                # in case the parent class is abstract no need to join
+                # it into the current query
+                if parent.is_abstract():
+                    continue

-            # retrieves the parent name, assumes the
-            # associated table has the same value
-            parent_name = parent.get_name()
-
-            # writes the table inheritance inner join
-            # part of the query, ensuring data coherence
-            # in the complete inheritance chain
-            query_buffer.write(" inner join ")
-            query_buffer.write(parent_name)
-            query_buffer.write(" on ")
-            query_buffer.write(table_name + "." + table_id)
-            query_buffer.write(" = ")
-            query_buffer.write(parent_name + "." + table_id)
+                # retrieves the parent name, assumes the
+                # associated table has the same value
+                parent_name = parent.get_name()
+
+                # writes the table inheritance inner join
+                # part of the query, ensuring data coherence
+                # in the complete inheritance chain
+                query_buffer.write(" inner join ")
+                query_buffer.write(parent_name)
+                query_buffer.write(" on ")
+                query_buffer.write(table_name + "." + table_id)
+                query_buffer.write(" = ")
+                query_buffer.write(parent_name + "." + table_id)



⚠️ Potential issue | 🔴 Critical

Inherited relation joins still point at parent aliases.

For concrete-table hierarchies, Lines 4673-4697 remove the parent tables from FROM, but join_tables() still builds _prefix from _entity_class.get_name() when the relation is declared on an ancestor. Eager/default loading of an inherited relation will emit predicates against Parent.* even though Parent is no longer joined.

🐛 Minimal fix

- if entity_class == _entity_class: - _prefix = prefix or _table_name - else: - _prefix = ( - (prefix + "___" + _table_name) if prefix else _table_name - ) + if entity_class.is_concrete_table(): + _prefix = prefix or entity_class.get_name() + elif entity_class == _entity_class: + _prefix = prefix or _table_name + else: + _prefix = ( + (prefix + "___" + _table_name) if prefix else _table_name + )

Also applies to: 4771-4780

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/src/entity_manager/system.py` around lines 4673 - 4697, The joins for inherited relations still use the ancestor class name as the SQL prefix, causing predicates to reference Parent.* even after Parent is removed from FROM; update join_tables() so that when a relation is declared on an ancestor it computes the SQL prefix from the concrete/table-owning class (not _entity_class.get_name())—e.g. resolve the concrete subclass/table name (or the concrete table alias mapping used when building the FROM joins) and use that for _prefix and any predicate generation; apply the same fix to the equivalent logic referenced around lines 4771-4780 so inherited relations always target the concrete table alias rather than the ancestor name.

coderabbitai · 2026-03-30T15:10:22Z

scripts/benchmark_inheritance.py

+    cti_fd, cti_path = tempfile.mkstemp(suffix=".db")
+    concrete_fd, concrete_path = tempfile.mkstemp(suffix=".db")
+    os.close(cti_fd)
+    os.close(concrete_fd)
+
+    try:
+        cti_conn = sqlite3.connect(cti_path)
+        concrete_conn = sqlite3.connect(concrete_path)
+
+        # benchmarks the schema creation
+        cti_ms, _ = benchmark("create_schema", create_cti_schema, cti_conn)
+        concrete_ms, _ = benchmark(
+            "create_schema", create_concrete_schema, concrete_conn
+        )
+        results.append(("Schema Creation", cti_ms, concrete_ms))
+
+        # benchmarks the bulk insert operation
+        def bulk_insert_cti():
+            for i in range(1, iterations + 1):
+                insert_cti(cti_conn, i, "person_%d" % i, 20 + (i % 50), 1000 + i)
+            cti_conn.commit()
+
+        def bulk_insert_concrete():
+            for i in range(1, iterations + 1):
+                insert_concrete(
+                    concrete_conn, i, "person_%d" % i, 20 + (i % 50), 1000 + i
+                )
+            concrete_conn.commit()
+
+        cti_ms, _ = benchmark("bulk_insert", bulk_insert_cti)
+        concrete_ms, _ = benchmark("bulk_insert", bulk_insert_concrete)
+        results.append(("Bulk INSERT (%d rows)" % iterations, cti_ms, concrete_ms))
+
+        # benchmarks the single select by ID (warm cache, middle of range)
+        target_id = iterations // 2
+
+        def repeat_select_cti():
+            for _i in range(iterations):
+                select_by_id_cti(cti_conn, target_id)
+
+        def repeat_select_concrete():
+            for _i in range(iterations):
+                select_by_id_concrete(concrete_conn, target_id)
+
+        cti_ms, _ = benchmark("select_by_id", repeat_select_cti)
+        concrete_ms, _ = benchmark("select_by_id", repeat_select_concrete)
+        results.append(("SELECT by ID (%d lookups)" % iterations, cti_ms, concrete_ms))
+
+        # benchmarks the find all (full table scan)
+        def repeat_find_all_cti():
+            for _i in range(100):
+                select_all_cti(cti_conn)
+
+        def repeat_find_all_concrete():
+            for _i in range(100):
+                select_all_concrete(concrete_conn)
+
+        cti_ms, _ = benchmark("find_all", repeat_find_all_cti)
+        concrete_ms, _ = benchmark("find_all", repeat_find_all_concrete)
+        results.append(("SELECT all (100 scans)", cti_ms, concrete_ms))
+
+        # benchmarks the count operation
+        def repeat_count_cti():
+            for _i in range(iterations):
+                count_cti(cti_conn)
+
+        def repeat_count_concrete():
+            for _i in range(iterations):
+                count_concrete(concrete_conn)
+
+        cti_ms, _ = benchmark("count", repeat_count_cti)
+        concrete_ms, _ = benchmark("count", repeat_count_concrete)
+        results.append(("COUNT (%d counts)" % iterations, cti_ms, concrete_ms))
+
+        # benchmarks the bulk update operation
+        def bulk_update_cti():
+            for i in range(1, iterations + 1):
+                update_cti(cti_conn, i, "updated_%d" % i, 2000 + i)
+            cti_conn.commit()
+
+        def bulk_update_concrete():
+            for i in range(1, iterations + 1):
+                update_concrete(concrete_conn, i, "updated_%d" % i, 2000 + i)
+            concrete_conn.commit()
+
+        cti_ms, _ = benchmark("bulk_update", bulk_update_cti)
+        concrete_ms, _ = benchmark("bulk_update", bulk_update_concrete)
+        results.append(("Bulk UPDATE (%d rows)" % iterations, cti_ms, concrete_ms))
+
+        # benchmarks the bulk delete operation
+        def bulk_delete_cti():
+            for i in range(1, iterations + 1):
+                delete_cti(cti_conn, i)
+            cti_conn.commit()
+
+        def bulk_delete_concrete():
+            for i in range(1, iterations + 1):
+                delete_concrete(concrete_conn, i)
+            concrete_conn.commit()
+
+        cti_ms, _ = benchmark("bulk_delete", bulk_delete_cti)
+        concrete_ms, _ = benchmark("bulk_delete", bulk_delete_concrete)
+        results.append(("Bulk DELETE (%d rows)" % iterations, cti_ms, concrete_ms))
+
+        # closes the narrow benchmark connections
+        cti_conn.close()
+        concrete_conn.close()
+
+        # adds a separator for the wide hierarchy benchmarks
+        results.append(None)
+
+        # creates the temporary database files for the wide
+        # hierarchy benchmarks (20 columns per level, 60 total)
+        wide_cti_fd, wide_cti_path = tempfile.mkstemp(suffix=".db")
+        wide_concrete_fd, wide_concrete_path = tempfile.mkstemp(suffix=".db")
+        os.close(wide_cti_fd)
+        os.close(wide_concrete_fd)
+
+        wide_cti_conn = sqlite3.connect(wide_cti_path)
+        wide_concrete_conn = sqlite3.connect(wide_concrete_path)
+
+        # benchmarks the wide schema creation
+        cti_ms, _ = benchmark(
+            "wide_create_schema", create_wide_cti_schema, wide_cti_conn
+        )
+        concrete_ms, _ = benchmark(
+            "wide_create_schema", create_wide_concrete_schema, wide_concrete_conn
+        )
+        results.append(("Wide Schema Creation (60 cols)", cti_ms, concrete_ms))
+
+        # benchmarks the wide bulk insert
+        def wide_bulk_insert_cti():
+            for i in range(1, iterations + 1):
+                insert_wide_cti(wide_cti_conn, i)
+            wide_cti_conn.commit()
+
+        def wide_bulk_insert_concrete():
+            for i in range(1, iterations + 1):
+                insert_wide_concrete(wide_concrete_conn, i)
+            wide_concrete_conn.commit()
+
+        cti_ms, _ = benchmark("wide_bulk_insert", wide_bulk_insert_cti)
+        concrete_ms, _ = benchmark("wide_bulk_insert", wide_bulk_insert_concrete)
+        results.append(
+            ("Wide INSERT (%d rows, 60 cols)" % iterations, cti_ms, concrete_ms)
+        )
+
+        # benchmarks the wide select by ID
+        target_id = iterations // 2
+
+        def wide_repeat_select_cti():
+            for _i in range(iterations):
+                select_by_id_wide_cti(wide_cti_conn, target_id)
+
+        def wide_repeat_select_concrete():
+            for _i in range(iterations):
+                select_by_id_wide_concrete(wide_concrete_conn, target_id)
+
+        cti_ms, _ = benchmark("wide_select_by_id", wide_repeat_select_cti)
+        concrete_ms, _ = benchmark("wide_select_by_id", wide_repeat_select_concrete)
+        results.append(
+            ("Wide SELECT by ID (%d, 60 cols)" % iterations, cti_ms, concrete_ms)
+        )
+
+        # closes the wide benchmark connections
+        wide_cti_conn.close()
+        wide_concrete_conn.close()
+
+        if os.path.exists(wide_cti_path):
+            os.remove(wide_cti_path)
+        if os.path.exists(wide_concrete_path):
+            os.remove(wide_concrete_path)
+
+    finally:
+        # removes the temporary database files
+        if os.path.exists(cti_path):
+            os.remove(cti_path)
+        if os.path.exists(concrete_path):
+            os.remove(concrete_path)
+
+    return results


⚠️ Potential issue | 🟠 Major

Make temp database cleanup exception-safe.

If any benchmark step raises before the explicit close() calls, the finally block tries to delete still-open SQLite files. That can raise on Windows and mask the real failure, and the wide temp databases leak on early exits.

♻️ Suggested cleanup structure

def run_benchmarks(iterations=None): @@ - try: - cti_conn = sqlite3.connect(cti_path) - concrete_conn = sqlite3.connect(concrete_path) + cti_conn = None + concrete_conn = None + wide_cti_conn = None + wide_concrete_conn = None + wide_cti_path = None + wide_concrete_path = None + + try: + cti_conn = sqlite3.connect(cti_path) + concrete_conn = sqlite3.connect(concrete_path) @@ - # closes the wide benchmark connections - wide_cti_conn.close() - wide_concrete_conn.close() - - if os.path.exists(wide_cti_path): - os.remove(wide_cti_path) - if os.path.exists(wide_concrete_path): - os.remove(wide_concrete_path) - finally: - # removes the temporary database files - if os.path.exists(cti_path): - os.remove(cti_path) - if os.path.exists(concrete_path): - os.remove(concrete_path) + for connection in ( + wide_cti_conn, + wide_concrete_conn, + cti_conn, + concrete_conn, + ): + if connection: + connection.close() + for path in ( + wide_cti_path, + wide_concrete_path, + cti_path, + concrete_path, + ): + if path and os.path.exists(path): + os.remove(path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/benchmark_inheritance.py` around lines 575 - 755, The cleanup currently deletes temporary DB files in the outer finally without guaranteeing the SQLite connections (cti_conn, concrete_conn, wide_cti_conn, wide_concrete_conn) are closed, which can fail on Windows and leak wide DB files; modify the function to track these connection objects and paths (cti_conn, concrete_conn, wide_cti_conn, wide_concrete_conn and cti_path, concrete_path, wide_cti_path, wide_concrete_path), and in the finally block first safely close any non-None/open connections (catch and ignore errors on close) and then remove their corresponding files if they exist; ensure wide_* variables are defined (init to None) before use so early exceptions still trigger proper closure and deletion.

coderabbitai · 2026-03-30T15:10:23Z

scripts/migrate_inheritance.py

+    def _collect(cls):
+        for subclass in cls.__subclasses__():
+            # avoids duplicates from multiple inheritance paths
+            # (e.g. Employee inherits from both Person and Taxable)
+            if subclass in classes:
+                continue
+            if not subclass.is_abstract():
+                classes.append(subclass)
+            _collect(subclass)
+
+    _collect(entity_class)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "migrate_inheritance.py" -type f

Repository: hivesolutions/colony-plugins

Length of output: 104

🏁 Script executed:

wc -l scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 106

🏁 Script executed:

sed -n '65,85p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 653

🏁 Script executed:

sed -n '820,840p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 646

🏁 Script executed:

# Get context around the function definitions sed -n '1,100p' scripts/migrate_inheritance.py | head -100

Repository: hivesolutions/colony-plugins

Length of output: 3114

🏁 Script executed:

sed -n '825,835p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 437

🏁 Script executed:

# Get broader context around the main execution sed -n '800,870p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 2019

🏁 Script executed:

# Check for any imports or code that discovers/loads all subclasses rg "__subclasses__\|import.*subclass" scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 54

🏁 Script executed:

# Look for how the module is imported and if there are any mechanisms to discover all subclasses sed -n '800,840p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 1196

🏁 Script executed:

# Get the full picture around the migrate function call and setup sed -n '700,870p' scripts/migrate_inheritance.py | tail -100

Repository: hivesolutions/colony-plugins

Length of output: 2882

🏁 Script executed:

# Look for the migrate function definition sed -n '1,400p' scripts/migrate_inheritance.py | grep -A 30 "^def migrate"

Repository: hivesolutions/colony-plugins

Length of output: 54

🏁 Script executed:

# Check if there are any warnings or validations for incomplete hierarchies rg "hierarchy|subclass|import" scripts/migrate_inheritance.py -A 2 -B 2

Repository: hivesolutions/colony-plugins

Length of output: 7384

🏁 Script executed:

# Check if there's any code that ensures all subclasses are imported before calling get_hierarchy_classes sed -n '200,400p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 7321

🏁 Script executed:

# Look at the migrate function definition more completely rg "^def migrate" scripts/migrate_inheritance.py -A 50

Repository: hivesolutions/colony-plugins

Length of output: 1755

🏁 Script executed:

# Check if there are any comments or docstrings mentioning the need to import subclasses rg "import.*subclass|all.*subclass|discover.*class" scripts/migrate_inheritance.py -B 2 -A 2

Repository: hivesolutions/colony-plugins

Length of output: 54

🏁 Script executed:

# Look for any mechanism that might ensure subclasses are imported grep -n "import\|__import__" scripts/migrate_inheritance.py | head -20

Repository: hivesolutions/colony-plugins

Length of output: 317

🏁 Script executed:

# Check if there's any documentation or comments about this limitation sed -n '1,100p' scripts/migrate_inheritance.py | grep -i "import\|subclass"

Repository: hivesolutions/colony-plugins

Length of output: 364

🏁 Script executed:

# Get the complete main function and surrounding code to understand the full flow sed -n '760,870p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 3204

Don't rely on __subclasses__() alone to discover the complete hierarchy.

The CLI only imports the root class module via __import__(). If subclasses reside in separate modules and haven't been imported yet, __subclasses__() will silently discover only the imported portion of the hierarchy, resulting in an incomplete migration plan that skips tables from unimported classes.

Also applies to: 828-831

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_inheritance.py` around lines 70 - 80, The current _collect recursion relies on __subclasses__() which only reflects already-imported modules; before calling _collect(entity_class) ensure all modules that could define subclasses of entity_class are imported (e.g., use pkgutil.walk_packages()/importlib.import_module over the package containing entity_class.__module__ or otherwise import relevant modules), then run the existing _collect function that appends to classes; keep the same symbols (_collect, classes, entity_class, __subclasses__()) but add the module discovery/import step prior to invoking _collect so the full class hierarchy is loaded.

coderabbitai · 2026-03-30T15:10:23Z

scripts/migrate_inheritance.py

+    if engine == "sqlite":
+        file_path = connection_params.get("file_path", "")
+        backup_path = "%s.backup.%d" % (file_path, timestamp)
+        shutil.copy2(file_path, backup_path)
+        return backup_path


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, read the actual code at lines 159-163 head -170 scripts/migrate_inheritance.py | tail -20

Repository: hivesolutions/colony-plugins

Length of output: 785

🏁 Script executed:

# Check for any WAL checkpoint calls in the backup function and surrounding code rg "wal_checkpoint|WAL|backup" scripts/migrate_inheritance.py -B 3 -A 3

Repository: hivesolutions/colony-plugins

Length of output: 3645

🏁 Script executed:

# Look for test files and their checkpoint patterns fd "test.*\.py" scripts/

Repository: hivesolutions/colony-plugins

Length of output: 54

🏁 Script executed:

# Check if there are any tests that mention wal_checkpoint or backup rg "wal_checkpoint|FULL.*backup" . -t py -B 2 -A 2

Repository: hivesolutions/colony-plugins

Length of output: 1419

Call PRAGMA wal_checkpoint(FULL) before copying the SQLite database file to ensure WAL data is flushed.

With WAL enabled, recent transactions can live in the -wal sidecar file, and shutil.copy2() on only the main database file will produce an incomplete backup. The tests already demonstrate awareness of this (they call wal_checkpoint(FULL) in multiple places), but the backup_database() function should apply the same safety measure before performing the file copy.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_inheritance.py` around lines 159 - 163, The backup path code for SQLite does not flush WAL to the main DB before copying, so modify the sqlite branch (the block that checks if engine == "sqlite" and uses connection_params/file_path) to open a sqlite3 connection to file_path (or use the existing connection if available) and execute "PRAGMA wal_checkpoint(FULL)" (via cursor.execute) and commit/close before calling shutil.copy2; then perform the existing backup_path creation and copy as before so the backup includes WAL changes.

scripts/migrate_inheritance.py

coderabbitai · 2026-03-30T15:10:23Z

scripts/migrate_inheritance.py

+    # validates the hierarchy first
+    is_valid, validation_messages = validate_hierarchy(entity_class, target_strategy)
+    messages.extend(validation_messages)
+
+    if not is_valid:
+        return False, messages
+
+    if validate_only:
+        return True, messages
+
+    # determines the current strategy and generates the
+    # appropriate migration queries
+    current_strategy = entity_class.get_inheritance_strategy()
+
+    if current_strategy == "class_table" and target_strategy == "concrete_table":
+        queries = generate_cti_to_concrete_queries(entity_class, types_map, connection)
+    elif current_strategy == "concrete_table" and target_strategy == "class_table":
+        queries = generate_concrete_to_cti_queries(entity_class, types_map, connection)
+    else:


⚠️ Potential issue | 🔴 Critical

Run data validation before declaring the migration safe.

migrate() returns after validate_hierarchy() and never calls validate_data(). That means orphaned CTI rows will be reported as valid and can then disappear during the INNER JOIN copy in generate_cti_to_concrete_queries().

🛡️ Suggested validation hook

if not is_valid: return False, messages - - if validate_only: - return True, messages # determines the current strategy and generates the # appropriate migration queries current_strategy = entity_class.get_inheritance_strategy() + + if current_strategy == "class_table": + is_valid = validate_data(connection, entity_class, messages) + if not is_valid: + return False, messages + + if validate_only: + return True, messages

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_inheritance.py` around lines 707 - 725, The migrate() flow currently stops after validate_hierarchy() and skips validate_data(), so run validate_data() after a successful hierarchy validation and before proceeding to build queries (i.e., after the existing is_valid check and before the current_strategy logic); if validate_data() returns failures, append its messages to messages and return False, messages, and if validate_only is true return True, messages only when both validate_hierarchy() and validate_data() succeed; ensure you call validate_data(entity_class, connection, types_map) (or the appropriate signature) and propagate its results before calling generate_cti_to_concrete_queries() or generate_concrete_to_cti_queries().

…support Move all migration tests out of the Colony entity manager test bundle into scripts/test/test_migrate_inheritance.py as a self-contained test suite with its own mock entity classes and raw SQLite setup. Remove EntityManagerMigrationTestCase from entity_manager/test.py. Make the migration script engine-agnostic with helper functions: - has_table(): engine-specific catalog queries (sqlite_master, information_schema.tables, pg_tables) - index_query(): engine-specific CREATE INDEX syntax (SQLite plain, MySQL USING after column, PostgreSQL USING before column) with name truncation (64 chars MySQL, 63 chars PostgreSQL) - insert_ignore_query(): engine-specific duplicate handling (INSERT OR IGNORE for SQLite, INSERT IGNORE for MySQL, ON CONFLICT DO NOTHING for PostgreSQL) CI now uses pytest to run all scripts/test/ files: pip install pytest pytest scripts/test/ -v Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

data/src/entity_manager/test.py (1)

2516-2518: Consider consistent save order for related entities.

The save order here (person saved before address when person.address = address) differs from the pattern established in the base test_one_to_one (lines 361-372), where the referenced entity is saved first. While this may work correctly, maintaining consistency across tests helps with readability and maintainability.
🔧 Suggested fix to match established pattern
 person.address = address
-self.entity_manager.save(person)
 self.entity_manager.save(address)
+self.entity_manager.save(person)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/test.py` around lines 2516 - 2518, The test saves
person before address after setting person.address, which is inconsistent with
the established pattern in test_one_to_one; update the save order so the
referenced entity is persisted first (call self.entity_manager.save(address)
before self.entity_manager.save(person)) after assigning person.address =
address to match the base test pattern and improve readability and
maintainability.

scripts/migrate_inheritance.py (3)

486-489: Prefix unused loop variables with underscore.

sql_type and is_pk are unpacked but not used. Prefix them with underscore to indicate intentionally unused.

♻️ Proposed fix

-        for col_name, sql_type, is_pk in columns:
+        for col_name, _sql_type, _is_pk in columns:
             source_table = get_source_table_for_column(cls, col_name)
             select_cols.append("%s.%s" % (source_table, col_name))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 486 - 489, The loop unpacks
unused variables sql_type and is_pk which should be prefixed to signal
intentional non-use; modify the for-loop over columns to unpack as (col_name,
_sql_type, _is_pk) and leave the rest of the logic (calls to
get_source_table_for_column(cls, col_name) and appending to select_cols)
unchanged so the variables and intent are clear.

404-408: SQL built via string formatting - ensure table/column names are trusted.

The query construction uses string formatting with table_name, table_id, and parent_table. While these values derive from entity class metadata (not user input), this pattern doesn't use parameterized queries for identifiers. This is acceptable since SQL doesn't support parameterized table/column names, but ensure entity class names are always code-defined and never user-controlled.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 404 - 408, The SQL is built via
Python string formatting in cursor.execute using table_name, table_id, and
parent_table; replace this with a safe identifier composition (e.g.,
psycopg2.sql.SQL/Identifier) or assert/validate that these values are strictly
code-defined entity names (no user input) before use. Locate the cursor.execute
call in migrate_inheritance.py and either (1) rebuild the statement using
psycopg2.sql.Identifier for table/column identifiers or (2) add explicit
validation (regex whitelist or membership check against known entity metadata)
for table_name, table_id and parent_table to guarantee they cannot be
user-controlled, then use them in the query.

847-853: Consider logging when rollback fails.

The try-except-pass pattern on the rollback silently swallows any rollback failure. While avoiding a cascade of exceptions is reasonable, knowing that rollback failed is critical for debugging and manual recovery.

♻️ Proposed fix

     except Exception as exception:
         # rolls back the transaction on failure
         try:
             connection.rollback()
         except Exception:
-            pass
+            messages.append("warning: rollback also failed")
         messages.append("migration failed: %s" % str(exception))
         messages.append("transaction rolled back")
         return False, messages

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 847 - 853, The rollback
exception is being swallowed by a bare except in the block around
connection.rollback; catch the exception from connection.rollback (e.g., as
rollback_exc) and record/log it instead of pass so failures are visible—update
the try/except around connection.rollback to append a message like "rollback
failed: %s" % str(rollback_exc) to messages (or call the existing logger if
available) before continuing, and keep the existing "transaction rolled back"
and return behavior; reference the connection.rollback call and the messages
list and exception variable in your change.

scripts/test/test_migrate_inheritance.py (1)

102-105: Use is for class identity comparison.

When comparing class objects, is is more appropriate than == since it checks identity rather than equality.

♻️ Proposed fix

     def _get_entity_parents(cls):
         parents = []
         for base in cls.__bases__:
-            if base == MockEntityBase or base == object:
+            if base is MockEntityBase or base is object:
                 continue
             parents.append(base)
         return parents

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/test/test_migrate_inheritance.py` around lines 102 - 105, In the loop
that iterates over cls.__bases__ (variables cls and base), replace the equality
checks using "==" with identity checks using "is" when comparing against
MockEntityBase and object (i.e., use "base is MockEntityBase" and "base is
object"); update any similar class comparisons in the same function to use "is"
to ensure class identity semantics are used rather than equality.

.github/workflows/main.yml (1)

70-77: Consider adding pytest to PyPy builds (excluding 2.7).

The PyPy build jobs don't include the new pytest step. While PyPy 2.7 should be excluded, the migration tests should work with PyPy 3.x versions when they're uncommented in the matrix.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/main.yml around lines 70 - 77, Add a pytest invocation to
the PyPy job so migration tests run on PyPy 3.x: modify the job steps that
currently include the "run: pypy setup.py test" step to first install and run
pytest (e.g., "pip install pytest" then "pytest") but guard it with the matrix
condition (matrix.python-version != '2.7') so PyPy 2.7 is excluded; update the
existing "run: pypy setup.py test" step or add a preceding step that references
matrix.python-version and the existing "pypy setup.py test" entry to ensure
tests run under PyPy 3.x.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/migrate_inheritance.py`:
- Around line 116-121: The current truncation uses index_name =
index_name[-limit:] which slices the tail and can collide for similar suffixes;
modify the logic around INDEX_NAME_LIMITS handling to produce deterministic
unique names by combining a preserved prefix with a short hash of the full
index_name: compute a stable hash (e.g., md5/sha1) of the original "%s_%s_%s" %
(table_name, attribute_name, index_type), then build index_name as prefix + "_"
+ hash_suffix such that len(index_name) <= limit (choose prefix length = limit -
1 - len(hash_suffix)); update the code that reads/writes index_name and ensure
engine, table_name, attribute_name and index_type are used to compute the hash
to guarantee uniqueness across similar-suffixed names.

---

Nitpick comments:
In @.github/workflows/main.yml:
- Around line 70-77: Add a pytest invocation to the PyPy job so migration tests
run on PyPy 3.x: modify the job steps that currently include the "run: pypy
setup.py test" step to first install and run pytest (e.g., "pip install pytest"
then "pytest") but guard it with the matrix condition (matrix.python-version !=
'2.7') so PyPy 2.7 is excluded; update the existing "run: pypy setup.py test"
step or add a preceding step that references matrix.python-version and the
existing "pypy setup.py test" entry to ensure tests run under PyPy 3.x.

In `@data/src/entity_manager/test.py`:
- Around line 2516-2518: The test saves person before address after setting
person.address, which is inconsistent with the established pattern in
test_one_to_one; update the save order so the referenced entity is persisted
first (call self.entity_manager.save(address) before
self.entity_manager.save(person)) after assigning person.address = address to
match the base test pattern and improve readability and maintainability.

In `@scripts/migrate_inheritance.py`:
- Around line 486-489: The loop unpacks unused variables sql_type and is_pk
which should be prefixed to signal intentional non-use; modify the for-loop over
columns to unpack as (col_name, _sql_type, _is_pk) and leave the rest of the
logic (calls to get_source_table_for_column(cls, col_name) and appending to
select_cols) unchanged so the variables and intent are clear.
- Around line 404-408: The SQL is built via Python string formatting in
cursor.execute using table_name, table_id, and parent_table; replace this with a
safe identifier composition (e.g., psycopg2.sql.SQL/Identifier) or
assert/validate that these values are strictly code-defined entity names (no
user input) before use. Locate the cursor.execute call in migrate_inheritance.py
and either (1) rebuild the statement using psycopg2.sql.Identifier for
table/column identifiers or (2) add explicit validation (regex whitelist or
membership check against known entity metadata) for table_name, table_id and
parent_table to guarantee they cannot be user-controlled, then use them in the
query.
- Around line 847-853: The rollback exception is being swallowed by a bare
except in the block around connection.rollback; catch the exception from
connection.rollback (e.g., as rollback_exc) and record/log it instead of pass so
failures are visible—update the try/except around connection.rollback to append
a message like "rollback failed: %s" % str(rollback_exc) to messages (or call
the existing logger if available) before continuing, and keep the existing
"transaction rolled back" and return behavior; reference the connection.rollback
call and the messages list and exception variable in your change.

In `@scripts/test/test_migrate_inheritance.py`:
- Around line 102-105: In the loop that iterates over cls.__bases__ (variables
cls and base), replace the equality checks using "==" with identity checks using
"is" when comparing against MockEntityBase and object (i.e., use "base is
MockEntityBase" and "base is object"); update any similar class comparisons in
the same function to use "is" to ensure class identity semantics are used rather
than equality.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ddc6489-533c-492d-82d5-f4ec1e4185ae

📥 Commits

Reviewing files that changed from the base of the PR and between 47dbdbb and f8e030e.

📒 Files selected for processing (4)

.github/workflows/main.yml
data/src/entity_manager/test.py
scripts/migrate_inheritance.py
scripts/test/test_migrate_inheritance.py

coderabbitai · 2026-03-30T16:15:35Z

scripts/migrate_inheritance.py

+    index_name = "%s_%s_%s" % (table_name, attribute_name, index_type)
+
+    # truncates the index name to the engine-specific limit
+    limit = INDEX_NAME_LIMITS.get(engine, None)
+    if limit:
+        index_name = index_name[-limit:]


⚠️ Potential issue | 🟡 Minor

Index name truncation from the start may cause collisions.

Using index_name[-limit:] takes the last N characters, which could produce duplicate names when multiple tables share similar suffixes. For example, table_a_very_long_attr_hash and table_b_very_long_attr_hash truncated to 64 chars would both end with the same suffix.

Consider truncating from the middle or using a hash-based suffix to ensure uniqueness.

♻️ Proposed fix using hash for uniqueness

+import hashlib + def index_query(table_name, attribute_name, index_type, engine="sqlite"): ... index_name = "%s_%s_%s" % (table_name, attribute_name, index_type) # truncates the index name to the engine-specific limit limit = INDEX_NAME_LIMITS.get(engine, None) - if limit: - index_name = index_name[-limit:] + if limit and len(index_name) > limit: + # use a hash suffix to ensure uniqueness when truncating + name_hash = hashlib.md5(index_name.encode()).hexdigest()[:8] + prefix_len = limit - 9 # leave room for underscore + 8-char hash + index_name = index_name[:prefix_len] + "_" + name_hash

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/migrate_inheritance.py` around lines 116 - 121, The current truncation uses index_name = index_name[-limit:] which slices the tail and can collide for similar suffixes; modify the logic around INDEX_NAME_LIMITS handling to produce deterministic unique names by combining a preserved prefix with a short hash of the full index_name: compute a stable hash (e.g., md5/sha1) of the original "%s_%s_%s" % (table_name, attribute_name, index_type), then build index_name as prefix + "_" + hash_suffix such that len(index_name) <= limit (choose prefix length = limit - 1 - len(hash_suffix)); update the code that reads/writes index_name and ensure engine, table_name, attribute_name and index_type are used to compute the hash to guarantee uniqueness across similar-suffixed names.

- Update debug, info, warning, error, critical methods to accept *args and **kwargs for deferred string formatting - Update class-level logging methods (_class_debug, etc.) with the same signature change - Enables logging calls like self.debug("Event %s", name) where formatting is skipped if the log level is not active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

joamag and others added 8 commits March 30, 2026 11:10

joamag mentioned this pull request Mar 30, 2026

feat: add Concrete Table Inheritance (CTI per-class) as an alternative inheritance strategy #25

Open

16 tasks

joamag and others added 3 commits March 30, 2026 12:57

chore: small design test change

a63831f

joamag self-assigned this Mar 30, 2026

joamag added the enhancement New feature or request label Mar 30, 2026

joamag marked this pull request as ready for review March 30, 2026 14:47

Copilot AI review requested due to automatic review settings March 30, 2026 14:47

Copilot started reviewing on behalf of joamag March 30, 2026 14:48 View session

chatgpt-codex-connector bot reviewed Mar 30, 2026

View reviewed changes

Copilot AI reviewed Mar 30, 2026

View reviewed changes

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

		"select count(*) from sqlite_master where type='table' and name=?",
		(cls.get_name(),),

		"insert or ignore into %s(%s) select %s from %s "
		"where _class = '%s'"

		# table inheritance strategy, in which case a single
		# insert query is generated for all attributes

	The override is resolved via `colony.conf("DATA_INHERITANCE", None)` at module load time in `structures.py` and checked as the first step in `get_inheritance_strategy()`.
	The override is resolved via `colony.conf("DATA_INHERITANCE", None)` inside `get_inheritance_strategy()` at runtime. The effective inheritance strategy is cached per entity class, so changes to `DATA_INHERITANCE` in a long-running process only take effect after clearing the per-class inheritance-strategy cache (or restarting the process) and then reloading/creating the relevant entities.

-    if validate_only:
+    if validate_only:
+        # performs data-level validation (eg orphan rows, referential integrity)
+        # without executing any migration queries
+        is_data_valid, data_messages = validate_data(
+            entity_class,
+            connection,
+            connection_params,
+        )
+        messages.extend(data_messages)
+        if not is_data_valid:
+            return False, messages

		# table inheritance strategy, in which case a single
		# update query is generated for all attributes

Conversation

joamag commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Schema design

Core changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

joamag commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading