Skip to content

feat: add Concrete Table Inheritance as alternative inheritance strategy#26

Open
joamag wants to merge 14 commits intomasterfrom
feat/concrete-table-inheritance
Open

feat: add Concrete Table Inheritance as alternative inheritance strategy#26
joamag wants to merge 14 commits intomasterfrom
feat/concrete-table-inheritance

Conversation

@joamag
Copy link
Copy Markdown
Contributor

@joamag joamag commented Mar 30, 2026

Summary

  • Introduces Concrete Table Inheritance (concrete_table) as an alternative to the existing Class Table Inheritance (class_table) strategy
  • Each concrete class stores all attributes (own + inherited) in its own table, and so does every ancestor class — enabling polymorphic queries at any hierarchy level without joins
  • Writing an entity inserts/updates/deletes rows in all ancestor tables (data duplication), while reading is always a single-table scan
  • Strategy is selectable per hierarchy via inheritance = "concrete_table" class attribute, fully backward compatible
  • Includes a standalone migration script (scripts/migrate_inheritance.py) for converting existing hierarchies between strategies
  • Design document: COP-002

Schema design

For ConcreteRootEntity -> ConcretePerson -> ConcreteEmployee:

_concrete_root_entity: object_id, status, _class, _mtime         (ALL entities)
_concrete_person:      object_id, status, name, age, _class, _mtime  (persons + employees)
_concrete_employee:    object_id, status, name, age, salary, _class, _mtime (employees only)

Core changes

  • structures.py: Added inheritance attribute, get_inheritance_strategy(), is_concrete_table(), and get_all_items() to EntityClass
  • system.py: Save/update/delete write to all ancestor tables; select reads from single table without joins; create builds flattened tables at every hierarchy level
  • mocks.py: Added ConcreteRootEntity, ConcretePerson, ConcreteEmployee, ConcreteAddress test entities
  • test.py: 77 tests covering base ORM, concrete table (including polymorphic queries), migration tooling, and backup
  • scripts/migrate_inheritance.py: Migration script with backup, validation, dry-run, and transactional execution
  • scripts/benchmark_inheritance.py: Performance benchmark comparing both strategies
  • doc/design/002-concrete_table_inheritance.md: Design document (COP-002)

Test plan

  • All existing CTI tests pass (zero regressions)
  • Strategy selector correctly resolves concrete_table through hierarchy
  • get_all_items() returns flattened items including inherited fields
  • Concrete table CREATE TABLE includes all inherited columns at every hierarchy level
  • Concrete table INSERT writes to all ancestor tables
  • Concrete table UPDATE writes to all ancestor tables
  • Concrete table DELETE removes from all ancestor tables
  • Concrete table SELECT reads from single table without joins
  • Polymorphic queries work (querying parent class returns descendant entities)
  • Relation handling (1:1, 1:N) works with concrete table entities
  • Migration script validates, dry-runs, backs up, and rolls back correctly

Closes #25

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added Concrete Table Inheritance mode with runtime strategy selection, migration utility, and benchmarking tool
    • Improved REST request connection address resolution (proxy headers, IPv6-mapped IPv4 cleanup)
  • Changed

    • Model logging methods now accept extra args/kwargs for lazy evaluation
  • Tests

    • Added extensive unit and end-to-end tests for inheritance, migration, validation, backups, and rollback
  • Documentation

    • New design doc and changelog/index updates
  • Chores

    • CI workflow extended to run new tests

Introduce a new concrete_table inheritance strategy that coexists with
the existing class_table (joined table) approach. When an entity hierarchy
opts in via `inheritance = "concrete_table"`, each concrete class produces
a single table containing all attributes (own + inherited), eliminating
joins on read.

Changes:
- Add `inheritance` class attribute, `get_inheritance_strategy()`, and
  `is_concrete_table()` to EntityClass in structures.py
- Add `get_all_items()` method for flattened attribute retrieval
- Branch _create_definition_query, _save_query, _update_query,
  _remove_query, and _find_query on inheritance strategy
- Skip parent table creation/joins for concrete_table entities
- Add migration script (scripts/migrate_inheritance.py) with backup,
  validation, dry-run, and transactional execution
- Add test entities and unit tests for the new strategy

Closes #25

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7dcb16ab-ece8-41fb-8cc9-0cc5fa99b44f

📥 Commits

Reviewing files that changed from the base of the PR and between f8e030e and 5c3626e.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • mvc/src/mvc_utils/model.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Adds selectable Concrete Table Inheritance (per-class concrete tables) with strategy selection on EntityClass, flattened item resolution, query and schema generation updates for concrete-mode CRUD and reads, new migration and benchmark CLI scripts for converting/comparing strategies, expanded concrete-table mocks and extensive tests, plus design docs and changelog updates.

Changes

Cohort / File(s) Summary
Core Inheritance Infrastructure
data/src/entity_manager/structures.py
Added DATA_INHERITANCE override, inheritance = "class_table" class attribute, cached internals, and classmethods get_inheritance_strategy(), is_concrete_table(), and get_all_items() for flattened attribute maps and strategy resolution.
SQL Query & Schema Generation
data/src/entity_manager/system.py
Branched DataEntityManager internals to support concrete-table mode: CREATE/INSERT/UPDATE/DELETE and SELECT now use flattened item maps for concrete classes, skip parent joins for reads, adjust discriminator (_class) emission and table-name resolution, and change per-operation query composition accordingly.
Test Mocks
data/src/entity_manager/mocks.py
Added concrete-table mock entities: ConcreteRootEntity, ConcretePerson, ConcreteEmployee, ConcreteAddress with initializers and relation mappers to exercise concrete inheritance in tests.
Test Suite Expansion
data/src/entity_manager/test.py
Implemented and expanded many EntityManager tests, added EntityManagerConcreteTableTestCase, and added coverage for CRUD, relations, polymorphism, pagination, validation, and concrete-table behaviors.
Migration & Benchmark Tools
scripts/migrate_inheritance.py, scripts/benchmark_inheritance.py
Added migration CLI to convert hierarchies between class_table and concrete_table (validation, dry-run, backup, transactional execution) and a benchmark script comparing CTI vs concrete-table strategies with narrow/wide schemas.
Migration Tests
scripts/test/test_migrate_inheritance.py
Added unit tests for migration logic: hierarchy validation, generated SQL inspection (both directions), engine-specific SQL helpers, backup behavior, dry-run, rollback handling, and end-to-end schema/data migration checks.
Design & Docs
doc/design/002-concrete_table_inheritance.md, doc/design/README.md, CHANGELOG.md
Added COP-002 design doc describing concrete-table semantics and global override, updated design index, and documented feature/migration additions in CHANGELOG.
CI Update
.github/workflows/main.yml
Added pytest step to run pytest scripts/test/ -v in multiple jobs and adjusted Python-version gating for black checks.
Logging API Update
mvc/src/mvc_utils/model.py
Extended model logging methods (debug/info/warning/error/critical and class-level variants) to accept *args, **kwargs and forward them to the plugin logger for lazy evaluation support.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant EM as EntityManager
    participant DB as Database

    rect rgba(100, 150, 255, 0.5)
    Note over App,EM: Class Table Inheritance (existing)
    App->>EM: save(Employee)
    EM->>EM: split attrs across ancestor tables
    EM->>DB: INSERT parent table(s)
    DB-->>EM: OK
    EM->>DB: INSERT leaf table
    DB-->>EM: OK
    App->>EM: find(Employee, id)
    EM->>DB: SELECT leaf JOIN parent ... WHERE id=...
    DB-->>EM: joined row set
    end

    rect rgba(100, 255, 150, 0.5)
    Note over App,EM: Concrete Table Inheritance (new)
    App->>EM: save(EmployeeConcrete)
    EM->>EM: flatten inherited + own attrs
    EM->>DB: INSERT INTO employee (all_cols..., _class)
    DB-->>EM: OK
    App->>EM: find(EmployeeConcrete, id)
    EM->>DB: SELECT * FROM employee WHERE id=...
    DB-->>EM: single row
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble rows and hop through schemes,

Concrete tables cradle all my dreams.
Fewer joins to bog my feet—
Each class a meadow, tidy and neat.
Backups hum; migrations keep me warm.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding Concrete Table Inheritance as a new inheritance strategy option for entity hierarchies.
Linked Issues check ✅ Passed All linked issue objectives are addressed: inheritance strategy selector added [#25-1], flattened item API implemented [#25-2], query generation branches by strategy [#25-3-8], relations/engines handled [#25-9-10], migration script provided [#25-11], test entities added [#25-12], comprehensive unit/migration/regression tests [#25-13-15], and CHANGELOG documented [#25-16].
Out of Scope Changes check ✅ Passed All changes are directly scoped to concrete-table inheritance feature: structures (strategy/helpers), system (SQL branching), mocks (test entities), test coverage, migration tooling, benchmark/design docs, logging signature updates (supporting lazy eval), CI workflow updates (pytest), and CHANGELOG entry. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/concrete-table-inheritance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

joamag and others added 8 commits March 30, 2026 11:10
Fill in previously stubbed tests (test_self_relation, test_database_integrity,
test_invalid_type, test_map, test_range) and add 15 new edge case tests:

- test_get_nonexistent: retrieving non-existent entity returns None
- test_find_empty: find on empty table returns empty list
- test_find_filters: equals, greater, lesser, and combined filters
- test_count_with_filters: count with filter conditions
- test_update_nullify_field: setting a field to None
- test_update_relation_change: changing a FK relation owner
- test_update_nullify_relation: removing a FK relation
- test_multiple_inheritance: Employee with Person+Loggable+Taxable
- test_polymorphism_employee: polymorphic find across hierarchy
- test_save_update_remove_cycle: full CRUD lifecycle
- test_file_data_type: File entity with data type field
- test_unicode_fields: unicode characters in text fields
- test_find_with_set: result set mode retrieval
- test_breeder_subclass: Breeder/BreedDog subclass relations
- test_remove_with_relations: removing entities with FK relations

Also fixes concrete table _class column qualifier in SELECT queries
to use the entity's own table name instead of the top parent table,
and fixes _process_table_name to resolve all attributes to the
entity's own table for concrete table inheritance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add EntityManagerMigrationTestCase with 11 tests covering the migration
script functionality:

- test_validate_same_strategy: rejects migration to current strategy
- test_validate_concrete_same_strategy: same for concrete table
- test_validate_cti_to_concrete: validates CTI->Concrete migration
- test_validate_concrete_to_cti: validates Concrete->CTI migration
- test_dry_run_cti_to_concrete: dry-run generates SQL without changes
- test_dry_run_concrete_to_cti: same for reverse direction
- test_backup_sqlite: creates backup file before migration
- test_migrate_cti_to_concrete_schema: verifies CTI table structure
- test_migrate_rollback_on_failure: simulated failure triggers rollback
- test_generate_queries_cti_to_concrete: verifies CREATE/INSERT/DROP/ALTER
- test_generate_queries_concrete_to_cti: same for reverse direction

Also moves inline imports to top-level per project conventions and uses
importlib for robust migration module loading across Colony's plugin
cache directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ests

Expand migration backup test coverage with three new tests:

- test_backup_data_integrity: opens the backup as a separate SQLite
  connection and verifies table structure and row data match the
  original database (tables exist, column values preserved)
- test_migrate_creates_backup: verifies that migrate() creates a
  backup file when skip_backup=False, using a failing connection
  to isolate the backup step from the actual migration SQL
- test_migrate_aborts_on_backup_failure: verifies that migrate()
  returns failure with appropriate message when backup fails

Also adds backup naming convention assertion (.backup.timestamp)
to the existing test_backup_sqlite test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Guard `import importlib.util` with try/except for Python 2.7
  compatibility (PyPy 2.7 and CPython 2.7 do not have importlib.util)
- Add `import sqlite3` at module level instead of inline
- Fix test_backup_data_integrity for Windows: close backup_conn in
  the finally block before os.remove() to release file handle
- Add WAL checkpoint before backup to ensure data is flushed to the
  main database file on Windows (WAL mode may leave data in the WAL
  file which shutil.copy2 does not capture)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ilable

The migration tests use importlib.util for dynamic module loading which
is not available in Python 2.7. Instead of erroring, the tests now
gracefully skip via unittest.skipTest when running on Python 2.7.

- Add HAS_IMPORTLIB_UTIL flag set during import
- Add setUp override in EntityManagerMigrationTestCase that calls
  skipTest when importlib.util is not available
- Guard _load_migration_module to return None on Python 2.7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add scripts/benchmark_inheritance.py that compares Class Table
Inheritance vs Concrete Table Inheritance performance across all
CRUD operations using SQLite.

The script measures: schema creation, bulk insert, select by ID,
select all, count, bulk update, and bulk delete - printing a
formatted report with speedup ratios.

Usage: python scripts/benchmark_inheritance.py [iterations]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a second benchmark section with 20 columns per level (60 total)
to demonstrate that concrete table's write advantage narrows with
wider rows and its read advantage can reverse entirely when the
single-table row becomes too wide for SQLite's page format.

Narrow (8 cols): concrete table ~2x faster on writes, ~same on reads
Wide (60 cols): concrete table ~1.1x on writes, can be slower on reads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revise the concrete table inheritance strategy so that every
concrete class in the hierarchy gets its own table containing all
fields flattened down to that hierarchy level. Writing an entity
inserts/updates/deletes rows in every ancestor table, enabling
polymorphic queries at any hierarchy level without joins.

Schema example (root -> person -> employee):
- _concrete_root_entity: object_id, status, _class, _mtime
- _concrete_person: object_id, status, name, age, _class, _mtime
- _concrete_employee: object_id, status, name, age, salary, _class, _mtime

Changes:
- create(): remove parent table skip, create all ancestor tables
  with flattened columns via get_all_items()
- _save_query: build items_map with one entry per ancestor class,
  each using that class's get_all_items()
- _update_query: same ancestor-level items_map approach
- _remove_query: delete from all ancestor tables (same as CTI)
- Add test_polymorphic_query verifying queries at root, person,
  and employee levels return correct entity counts
- Add design document COP-002 describing the schema, query
  behavior, implementation details, and trade-offs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joamag and others added 3 commits March 30, 2026 12:57
Complete rewrite of scripts/migrate_inheritance.py:

- Uses ORM's own entity class methods (get_name, get_all_items,
  get_names_map, get_id) instead of reimplementing name resolution
- Handles multiple inheritance (Employee -> Person + Loggable +
  Taxable) with duplicate detection in hierarchy traversal
- Filters hierarchy by table existence when connection is available,
  so only tables present in the data source are migrated
- Creates flat tables at every hierarchy level for CTI->Concrete
  (matching the new ancestor-population design)
- Uses INSERT OR IGNORE for Concrete->CTI to handle duplicates
  when multiple concrete tables map to the same CTI level
- Recreates all indexes: PK (hash+btree), _mtime (hash+btree),
  and FK indexes on mapped relation columns
- Adds validate_data() for pre-migration referential integrity
  checks (orphaned row detection)

End-to-end migration tests:
- test_end_to_end_cti_to_concrete: creates CTI tables via ORM,
  inserts data, migrates, verifies schema changes and data
  integrity at every hierarchy level including polymorphic rows
- test_end_to_end_concrete_to_cti: creates concrete tables via
  ORM, inserts data, migrates, verifies CTI table structure
  and column distribution
- test_migration_preserves_indexes: verifies index count on
  migrated tables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a DATA_INHERITANCE environment variable that overrides the
class-level inheritance attribute for all entity hierarchies,
enabling system-wide strategy switching for migration debugging
without modifying entity code.

Resolution order in get_inheritance_strategy():
1. Cached _inheritance_strategy (per-class, avoids repeated lookups)
2. colony.conf("DATA_INHERITANCE") with module-level fallback
3. Class-level inheritance attribute
4. Parent hierarchy traversal
5. Default "class_table"

Usage:
  export DATA_INHERITANCE=concrete_table  # force all entities
  unset DATA_INHERITANCE                  # use class attributes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joamag joamag self-assigned this Mar 30, 2026
@joamag joamag added the enhancement New feature or request label Mar 30, 2026
@joamag joamag marked this pull request as ready for review March 30, 2026 14:47
Copilot AI review requested due to automatic review settings March 30, 2026 14:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47dbdbb128

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# are skipped as all columns reside in the entity's own
# table, for class table inheritance the joins are required
# to reconstruct the full entity from multiple tables
if not is_concrete:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve parent aliases for inherited concrete relations

Skipping parent-table joins for every concrete-table query leaves inherited relation joins pointing at aliases that were never introduced. In _join_query_f, relations declared on a parent class still build conditions like _parent_table.<relation> via _prefix, but with this guard those parent aliases are absent, so eager-loading a relation inherited by a concrete subclass (e.g., querying ConcreteEmployee with a relation from ConcretePerson) produces invalid SQL at runtime.

Useful? React with 👍 / 👎.

Comment on lines +343 to +344
"select count(*) from sqlite_master where type='table' and name=?",
(cls.get_name(),),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use non-SQLite metadata queries for non-SQLite engines

The migration query generators always probe table existence with sqlite_master when a connection is provided. migrate() passes the live connection through, so using MySQL/PostgreSQL connections will fail before migration starts because those engines do not expose sqlite_master. This breaks the advertised non-SQLite migration path in migrate().

Useful? React with 👍 / 👎.

Comment on lines +609 to +610
"insert or ignore into %s(%s) select %s from %s "
"where _class = '%s'"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid SQLite-only INSERT syntax in cross-engine migration

The concrete→CTI path emits insert or ignore, which is SQLite-specific SQL. When migrate() is used with MySQL/PostgreSQL (the function exposes those engines), this statement is invalid and aborts the migration. Use an engine-specific conflict strategy (or generic inserts plus dedup logic) instead of hardcoding SQLite syntax.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ORM inheritance strategy (“concrete table inheritance”) to complement the existing class-table inheritance, plus tooling/docs/tests to support strategy selection and migration.

Changes:

  • Introduces inheritance = "concrete_table" on entity hierarchies, with strategy resolution (get_inheritance_strategy(), is_concrete_table()) and flattened field resolution (get_all_items()).
  • Updates entity manager SQL generation to create flattened tables and to read from a single table (no parent joins) while writing to all ancestor tables.
  • Adds migration + benchmark scripts, design documentation (COP-002), changelog entry, and expanded test coverage (including end-to-end migration tests).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
data/src/entity_manager/structures.py Adds strategy selector + cache, concrete-table helpers, and get_all_items() for flattened field maps.
data/src/entity_manager/system.py Branches table creation / save / update / find query-building behavior based on inheritance strategy.
data/src/entity_manager/mocks.py Adds concrete-table mock entities mirroring existing inheritance test hierarchies.
data/src/entity_manager/test.py Adds concrete-table tests plus migration/backup tests and fills in previously-stubbed test cases.
scripts/migrate_inheritance.py New standalone migration tool for converting hierarchies between strategies (backup/dry-run/transactional).
scripts/benchmark_inheritance.py New benchmark script comparing CTI vs concrete-table operations on SQLite.
doc/design/README.md Adds COP-002 to design doc index.
doc/design/002-concrete_table_inheritance.md Design document describing schema/query behavior, tradeoffs, and configuration.
CHANGELOG.md Documents the new inheritance strategy, migration tooling, and tests under Added.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3278 to +3279
# table inheritance strategy, in which case a single
# insert query is generated for all attributes
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says concrete-table mode generates “a single insert query”, but the implementation builds an items_map containing all ancestor tables and then iterates it, producing multiple INSERTs (one per ancestor table). Please update the comment to reflect the actual behavior so future maintainers don’t get misled.

Suggested change
# table inheritance strategy, in which case a single
# insert query is generated for all attributes
# table inheritance strategy, in which case an insert
# query is generated for each concrete/ancestor table

Copilot uses AI. Check for mistakes.
unset DATA_INHERITANCE
```

The override is resolved via `colony.conf("DATA_INHERITANCE", None)` at module load time in `structures.py` and checked as the first step in `get_inheritance_strategy()`.
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc says the global override is “checked at module load time”, but structures.py resolves the override inside get_inheritance_strategy() via colony.conf(...) and caches the result per class. Please update this section to reflect the actual runtime + caching behavior (and how to clear the cache when switching overrides).

Suggested change
The override is resolved via `colony.conf("DATA_INHERITANCE", None)` at module load time in `structures.py` and checked as the first step in `get_inheritance_strategy()`.
The override is resolved via `colony.conf("DATA_INHERITANCE", None)` inside `get_inheritance_strategy()` at runtime. The effective inheritance strategy is cached per entity class, so changes to `DATA_INHERITANCE` in a long-running process only take effect after clearing the per-class inheritance-strategy cache (or restarting the process) and then reloading/creating the relevant entities.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +54
SQL_TYPES_MAP = dict(
text="text",
string="varchar(255)",
integer="integer",
long="bigint",
float="double precision",
decimal="double precision",
date="double precision",
data="text",
metadata="text",
)
""" The default SQL types map for column type resolution """


Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL_TYPES_MAP is defined but never used in this benchmark script. Consider removing it (or wiring it into schema creation) to avoid dead code and keep the benchmark focused on the operations being measured.

Suggested change
SQL_TYPES_MAP = dict(
text="text",
string="varchar(255)",
integer="integer",
long="bigint",
float="double precision",
decimal="double precision",
date="double precision",
data="text",
metadata="text",
)
""" The default SQL types map for column type resolution """

Copilot uses AI. Check for mistakes.
Comment on lines +503 to +512
if connection:
_hierarchy = []
for cls in hierarchy:
cursor = connection.cursor()
cursor.execute(
"select count(*) from sqlite_master where type='table' and name=?",
(cls.get_name(),),
)
exists = cursor.fetchone()[0] > 0
cursor.close()
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrate()/query generation is written with SQLite-specific assumptions (e.g., querying sqlite_master for table existence) even when engine is set to mysql/pgsql. This makes the advertised non-SQLite support unreliable; consider either (1) explicitly restricting the migration module to SQLite-only, or (2) branching table-existence detection and SQL generation per engine (and using the correct INSERT/DDL dialect).

Copilot uses AI. Check for mistakes.
if not is_valid:
return False, messages

if validate_only:
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_data() implements the orphan-row / referential consistency checks described in the validation docstrings, but it is never called by migrate() (including --validate). As a result, validation currently only checks strategy configuration and ID presence, not data integrity; either invoke validate_data() during validation, or adjust the validation messaging/docs to match actual behavior.

Suggested change
if validate_only:
if validate_only:
# performs data-level validation (eg orphan rows, referential integrity)
# without executing any migration queries
is_data_valid, data_messages = validate_data(
entity_class,
connection,
connection_params,
)
messages.extend(data_messages)
if not is_data_valid:
return False, messages

Copilot uses AI. Check for mistakes.
Comment on lines +749 to +756
try:
cursor = connection.cursor()

for query in queries:
messages.append("executing: %s" % query)
cursor.execute(query)

connection.commit()
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration executes via a cursor created at the start of the transaction but never closes it. Please ensure the cursor is closed in a finally block (success and failure paths) to avoid leaking resources across repeated runs/tests.

Copilot uses AI. Check for mistakes.
__license__ = "Apache License, Version 2.0"
""" The license for the module """

import os
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: os is imported but not used anywhere in this script. Removing it avoids lint noise and keeps the migration module minimal.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Comment on lines +3495 to +3496
# table inheritance strategy, in which case a single
# update query is generated for all attributes
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this comment says concrete-table mode generates “a single update query”, but the code constructs an items_map across ancestors and generates multiple UPDATE statements. Please align the comment with the real multi-table update behavior in concrete-table mode.

Suggested change
# table inheritance strategy, in which case a single
# update query is generated for all attributes
# table inheritance strategy, in which case coordinated
# update statements are generated across the concrete tables

Copilot uses AI. Check for mistakes.
Comment on lines +608 to +619
insert_query = (
"insert or ignore into %s(%s) select %s from %s "
"where _class = '%s'"
% (
new_table_name,
", ".join(select_cols),
", ".join(select_cols),
concrete_table,
cls.__name__,
)
)
queries.append(insert_query)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_concrete_to_cti_queries() uses insert or ignore, which is SQLite syntax. If migrate() is intended to be usable with MySQL/PostgreSQL connections (as implied by the engine parameter and backup support), this should be generated per-engine (e.g., INSERT IGNORE for MySQL, ON CONFLICT DO NOTHING for PostgreSQL) or the script should fail fast when engine != sqlite.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (2)
data/src/entity_manager/test.py (2)

3308-3549: Exercise the ORM after each migration, not just raw SQLite reads.

These end-to-end cases stop at schema and row-level assertions. That won't catch a migration that produces the right tables but still breaks EntityManager.get/find/save against the migrated strategy. Reopening the entity manager and doing one ORM round-trip here would cover the integration boundary this PR is changing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/test.py` around lines 3308 - 3549, Add ORM round-trip
checks after each migration in test_end_to_end_cti_to_concrete and
test_end_to_end_concrete_to_cti: after calling migration.migrate and asserting
success, reinitialize or reopen the EntityManager (or at least clear its
session), then use EntityManager.get/find/save (and entity_manager.begin/commit
where needed) to load an existing migrated entity and to persist a new one to
verify full ORM operations work against the migrated schema; reference the
existing helpers used in the test (self.entity_manager, self._get_connection(),
and migration.migrate) to locate where to insert these checks.

2642-2713: Move the inheritance override state management into setUp/tearDown.

This test mutates structures.DATA_INHERITANCE and per-class caches inline. The try/finally helps, but the suite guideline for class-level test state is to capture it in setUp and restore it in tearDown so isolation is uniform across the module.

As per coding guidelines, "For class-level state in tests, save the original state in setUp and restore it in tearDown to ensure test isolation".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/test.py` around lines 2642 - 2713, The test mutates
module-global structures.DATA_INHERITANCE and per-class _inheritance_strategy
caches inline; move that state management into the test class's setUp and
tearDown: in setUp capture structures.DATA_INHERITANCE into
self._original_data_inheritance, build the test_classes list and save any
existing cls._inheritance_strategy into self._saved_caches, and clear caches
(del cls._inheritance_strategy) so each test starts fresh; in tearDown restore
structures.DATA_INHERITANCE from self._original_data_inheritance, clear any
temporary _inheritance_strategy entries, and reassign saved values from
self._saved_caches; then simplify test_global_inheritance_override by removing
the try/finally and relying on setUp/tearDown to manage state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/src/entity_manager/mocks.py`:
- Around line 539-543: _relation_boss currently sets reverse="children", which
pairs the boss relation to the parent/child metadata; change the reverse to a
dedicated reports collection (e.g., reverse="reports") so boss lookups and lazy
loading traverse the correct relation on ConcretePerson, and ensure
ConcretePerson defines a matching reports relation name; update the dict
returned by _relation_boss to use reverse="reports" (or the actual reports
relation name on ConcretePerson).

In `@data/src/entity_manager/structures.py`:
- Around line 147-151: The code currently reserves the public attribute name
"inheritance" (listed alongside "_inheritance_strategy" in the
INVALID_NAMES-like list), which breaks backwards compatibility; change the
behavior so the framework either uses a private sentinel name (e.g., keep and
use "_inheritance_strategy" as the config key and remove "inheritance" from the
invalid/reserved names) or make the reservation conditional by only treating
"inheritance" as the framework flag when its value equals one of the supported
strategy strings (check membership against your strategy constants) — update the
INVALID_NAMES array and any validation/filtering logic that references
"inheritance" accordingly (also apply the same change in the other occurrences
referenced in the review).
- Around line 1813-1875: get_all_items() currently merges parent and own items
by assigning into all_items such that later entries overwrite earlier ones
(last-wins), which can conflict with the ORM's owner precedence used by
get_names_map() that preserves the first owner; change the merge logic in
get_all_items() (the loop over cls.get_all_parents() and the subsequent
own_items loop that writes into all_items and sets setattr(cls, cache_key,
all_items)) so that on key collisions you preserve the existing owner/value
(first-wins) rather than overwriting, and ensure own_items still take precedence
over parents by only writing a key from own_items if it does not already exist
or by writing parents first then own items using a non-overwriting assign policy
consistent with get_names_map().

In `@data/src/entity_manager/system.py`:
- Around line 4673-4697: The joins for inherited relations still use the
ancestor class name as the SQL prefix, causing predicates to reference Parent.*
even after Parent is removed from FROM; update join_tables() so that when a
relation is declared on an ancestor it computes the SQL prefix from the
concrete/table-owning class (not _entity_class.get_name())—e.g. resolve the
concrete subclass/table name (or the concrete table alias mapping used when
building the FROM joins) and use that for _prefix and any predicate generation;
apply the same fix to the equivalent logic referenced around lines 4771-4780 so
inherited relations always target the concrete table alias rather than the
ancestor name.
- Around line 3063-3065: The concrete-table path flattens parent columns via
entity_class.get_all_items() but schema setup still only calls
index_fields(entity_class) which in turn uses entity_class.get_indexed(), so
inherited indexed fields on the concrete table are not registered; update the
indexing logic so that for concrete tables (when is_concrete and table_items
comes from entity_class.get_all_items()) you call the inherited-aware accessor
(e.g. entity_class.get_all_indexed() or otherwise merge parent indexed fields
into entity_class.get_indexed()) before calling index_fields(entity_class) so
the child's table carries secondary indexes defined on parents.
- Around line 5143-5148: The issue is that ordering resolution still uses
_resolve_name() which consults get_names_map() and returns parent-qualified
names for inherited concrete-table fields; update _resolve_name() (or add the
same early-return logic used in the filter branch) to check
entity_class.is_concrete_table() and return table_name or the entity's own name
(entity_class.get_name() / _own_table_name) for any resolved attribute path so
that _order_query_f() and eager relation paths do not produce Parent.field
aliases when the child uses concrete table inheritance; ensure this same change
covers any code paths where order_by or relation-paths call _resolve_name().

In `@doc/design/002-concrete_table_inheritance.md`:
- Around line 144-145: The doc line incorrectly states that DATA_INHERITANCE is
read at module load time; update the sentence to say that
`colony.conf("DATA_INHERITANCE", None)` is read inside
`get_inheritance_strategy()` at call time (not at import/module load), and
mention `get_inheritance_strategy()` in `structures.py` as the place the
override is checked so the documented lifecycle matches the code.

In `@scripts/benchmark_inheritance.py`:
- Around line 162-176: The benchmark helpers (e.g., insert_concrete) currently
only operate on the leaf concrete table, which underestimates real-world cost
because the real implementation duplicates writes into ancestor tables and
serves parent-level reads via UNION ALL; update insert_concrete and the
corresponding read helpers (the other functions mentioned around lines 239-252,
278-315, 429-510) to perform the same behavior as the real concrete-table
strategy: when inserting a leaf row, also insert into each ancestor table with
the appropriate columns and _class marker, and when running parent-level queries
use a UNION ALL over the relevant ancestor+leaf tables (or simulate that by
querying the ancestor tables combined) so the benchmark reflects the extra write
and read costs of duplicated ancestor rows and UNION ALL scans; locate and
modify functions named insert_concrete and the sibling insert/read helpers to
implement these changes.
- Around line 575-755: The cleanup currently deletes temporary DB files in the
outer finally without guaranteeing the SQLite connections (cti_conn,
concrete_conn, wide_cti_conn, wide_concrete_conn) are closed, which can fail on
Windows and leak wide DB files; modify the function to track these connection
objects and paths (cti_conn, concrete_conn, wide_cti_conn, wide_concrete_conn
and cti_path, concrete_path, wide_cti_path, wide_concrete_path), and in the
finally block first safely close any non-None/open connections (catch and ignore
errors on close) and then remove their corresponding files if they exist; ensure
wide_* variables are defined (init to None) before use so early exceptions still
trigger proper closure and deletion.

In `@scripts/migrate_inheritance.py`:
- Around line 707-725: The migrate() flow currently stops after
validate_hierarchy() and skips validate_data(), so run validate_data() after a
successful hierarchy validation and before proceeding to build queries (i.e.,
after the existing is_valid check and before the current_strategy logic); if
validate_data() returns failures, append its messages to messages and return
False, messages, and if validate_only is true return True, messages only when
both validate_hierarchy() and validate_data() succeed; ensure you call
validate_data(entity_class, connection, types_map) (or the appropriate
signature) and propagate its results before calling
generate_cti_to_concrete_queries() or generate_concrete_to_cti_queries().
- Around line 159-163: The backup path code for SQLite does not flush WAL to the
main DB before copying, so modify the sqlite branch (the block that checks if
engine == "sqlite" and uses connection_params/file_path) to open a sqlite3
connection to file_path (or use the existing connection if available) and
execute "PRAGMA wal_checkpoint(FULL)" (via cursor.execute) and commit/close
before calling shutil.copy2; then perform the existing backup_path creation and
copy as before so the backup includes WAL changes.
- Around line 338-345: The code uses SQLite-only SQL (sqlite_master and "insert
or ignore") inside generate_cti_to_concrete_queries and
generate_concrete_to_cti_queries and so will break for mysql/pgsql; update both
functions to accept an engine parameter and branch SQL generation by engine:
keep the sqlite_master query when engine=="sqlite", but use
information_schema.tables or pg_catalog (e.g. query information_schema.tables
with table_schema/name) for mysql and pgsql table-existence checks in place of
sqlite_master, and replace "insert or ignore" with engine-specific upsert syntax
("INSERT IGNORE" for mysql, "INSERT ... ON CONFLICT DO NOTHING" for pgsql) while
preserving behavior; ensure migrate() callers pass the engine through to these
functions and update places that use connection.cursor()/execute to use the
engine-specific SQL strings from generate_cti_to_concrete_queries /
generate_concrete_to_cti_queries.
- Around line 70-80: The current _collect recursion relies on __subclasses__()
which only reflects already-imported modules; before calling
_collect(entity_class) ensure all modules that could define subclasses of
entity_class are imported (e.g., use
pkgutil.walk_packages()/importlib.import_module over the package containing
entity_class.__module__ or otherwise import relevant modules), then run the
existing _collect function that appends to classes; keep the same symbols
(_collect, classes, entity_class, __subclasses__()) but add the module
discovery/import step prior to invoking _collect so the full class hierarchy is
loaded.

---

Nitpick comments:
In `@data/src/entity_manager/test.py`:
- Around line 3308-3549: Add ORM round-trip checks after each migration in
test_end_to_end_cti_to_concrete and test_end_to_end_concrete_to_cti: after
calling migration.migrate and asserting success, reinitialize or reopen the
EntityManager (or at least clear its session), then use
EntityManager.get/find/save (and entity_manager.begin/commit where needed) to
load an existing migrated entity and to persist a new one to verify full ORM
operations work against the migrated schema; reference the existing helpers used
in the test (self.entity_manager, self._get_connection(), and migration.migrate)
to locate where to insert these checks.
- Around line 2642-2713: The test mutates module-global
structures.DATA_INHERITANCE and per-class _inheritance_strategy caches inline;
move that state management into the test class's setUp and tearDown: in setUp
capture structures.DATA_INHERITANCE into self._original_data_inheritance, build
the test_classes list and save any existing cls._inheritance_strategy into
self._saved_caches, and clear caches (del cls._inheritance_strategy) so each
test starts fresh; in tearDown restore structures.DATA_INHERITANCE from
self._original_data_inheritance, clear any temporary _inheritance_strategy
entries, and reassign saved values from self._saved_caches; then simplify
test_global_inheritance_override by removing the try/finally and relying on
setUp/tearDown to manage state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8caefd7-f2a1-47cf-91b3-d1b30aeb53db

📥 Commits

Reviewing files that changed from the base of the PR and between a34f2b6 and 47dbdbb.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • data/src/entity_manager/mocks.py
  • data/src/entity_manager/structures.py
  • data/src/entity_manager/system.py
  • data/src/entity_manager/test.py
  • doc/design/002-concrete_table_inheritance.md
  • doc/design/README.md
  • scripts/benchmark_inheritance.py
  • scripts/migrate_inheritance.py

Comment on lines +539 to +543
@staticmethod
def _relation_boss():
return dict(
type="to-one", target=ConcretePerson, reverse="children", is_mapper=True
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

boss is paired with the wrong reverse relation.

Using reverse="children" ties manager lookups to the parent/child relation metadata. ConcretePerson needs a dedicated report collection here; otherwise reverse traversal and lazy loading for boss relationships will resolve the wrong side.

🐛 Suggested fix
 class ConcretePerson(ConcreteRootEntity):
@@
     children = dict(type="relation")
     """ The children of the current person """
 
+    employees = dict(type="relation")
+    """ The employees associated with the person """
+
@@
     def _relation_children():
         return dict(type="to-many", target=ConcretePerson, reverse="parent")
 
+    `@staticmethod`
+    def _relation_employees():
+        return dict(type="to-many", target=ConcreteEmployee, reverse="boss")
+
@@
 class ConcreteEmployee(ConcretePerson):
@@
     `@staticmethod`
     def _relation_boss():
         return dict(
-            type="to-one", target=ConcretePerson, reverse="children", is_mapper=True
+            type="to-one", target=ConcretePerson, reverse="employees", is_mapper=True
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/mocks.py` around lines 539 - 543, _relation_boss
currently sets reverse="children", which pairs the boss relation to the
parent/child metadata; change the reverse to a dedicated reports collection
(e.g., reverse="reports") so boss lookups and lazy loading traverse the correct
relation on ConcretePerson, and ensure ConcretePerson defines a matching reports
relation name; update the dict returned by _relation_boss to use
reverse="reports" (or the actual reports relation name on ConcretePerson).

Comment on lines +147 to +151
"_inheritance_strategy",
"_all_items",
"_all_non_foreign_items",
"abstract",
"inheritance",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

inheritance is now a breaking reserved attribute.

Any model that already persisted an inheritance field will silently lose it after this change: the name is filtered out by INVALID_NAMES, and the same attribute is now interpreted as framework strategy config. For a public ORM surface, that is a backward-compatibility break, not just an additive feature. Please move the strategy flag to a private/sentinel name, or only reserve inheritance when its value is one of the supported strategy strings.

Also applies to: 429-433, 2638-2642

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/structures.py` around lines 147 - 151, The code
currently reserves the public attribute name "inheritance" (listed alongside
"_inheritance_strategy" in the INVALID_NAMES-like list), which breaks backwards
compatibility; change the behavior so the framework either uses a private
sentinel name (e.g., keep and use "_inheritance_strategy" as the config key and
remove "inheritance" from the invalid/reserved names) or make the reservation
conditional by only treating "inheritance" as the framework flag when its value
equals one of the supported strategy strings (check membership against your
strategy constants) — update the INVALID_NAMES array and any
validation/filtering logic that references "inheritance" accordingly (also apply
the same change in the other occurrences referenced in the review).

Comment on lines +1813 to +1875
# retrieves all the parent classes and iterates over
# them to collect their items, this includes abstract
# parents whose items would normally be inherited into
# the first concrete parent
all_parents = cls.get_all_parents()
for parent in all_parents:
# retrieves the items directly defined in the parent
# class using the internal items method and filters
# them using the same criteria as get_items
parent_items = parent._items()
for key, value in colony.legacy.iteritems(parent_items):
if key in INVALID_NAMES:
continue
if key.isupper():
continue
if type(value) in (
types.FunctionType,
types.MethodType,
staticmethod,
classmethod,
property,
):
continue
if (
not foreign_relations
and cls.is_relation(key)
and not cls.is_mapped(key)
):
continue
if not hasattr(value, "get"):
continue
all_items[key] = value

# retrieves the items directly defined in the current
# class and merges them into the all items map (these
# take precedence over parent items)
own_items = cls._items()
for key, value in colony.legacy.iteritems(own_items):
if key in INVALID_NAMES:
continue
if key.isupper():
continue
if type(value) in (
types.FunctionType,
types.MethodType,
staticmethod,
classmethod,
property,
):
continue
if (
not foreign_relations
and cls.is_relation(key)
and not cls.is_mapped(key)
):
continue
if not hasattr(value, "get"):
continue
all_items[key] = value

# caches the all items in the class to provide
# fast access in latter access
setattr(cls, cache_key, all_items)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_all_items() can disagree with the ORM's existing owner precedence.

This merge overwrites on every collision, so the last parent wins. get_names_map() keeps the first owner instead, which means flattened CTI metadata can diverge from the owner/validation path for duplicate fields in multi-parent hierarchies.

♻️ Suggested fix
-        # retrieves all the parent classes and iterates over
-        # them to collect their items, this includes abstract
-        # parents whose items would normally be inherited into
-        # the first concrete parent
-        all_parents = cls.get_all_parents()
-        for parent in all_parents:
-            # retrieves the items directly defined in the parent
-            # class using the internal items method and filters
-            # them using the same criteria as get_items
-            parent_items = parent._items()
-            for key, value in colony.legacy.iteritems(parent_items):
-                if key in INVALID_NAMES:
-                    continue
-                if key.isupper():
-                    continue
-                if type(value) in (
-                    types.FunctionType,
-                    types.MethodType,
-                    staticmethod,
-                    classmethod,
-                    property,
-                ):
-                    continue
-                if (
-                    not foreign_relations
-                    and cls.is_relation(key)
-                    and not cls.is_mapped(key)
-                ):
-                    continue
-                if not hasattr(value, "get"):
-                    continue
-                all_items[key] = value
-
-        # retrieves the items directly defined in the current
-        # class and merges them into the all items map (these
-        # take precedence over parent items)
-        own_items = cls._items()
-        for key, value in colony.legacy.iteritems(own_items):
-            if key in INVALID_NAMES:
-                continue
-            if key.isupper():
-                continue
-            if type(value) in (
-                types.FunctionType,
-                types.MethodType,
-                staticmethod,
-                classmethod,
-                property,
-            ):
-                continue
-            if (
-                not foreign_relations
-                and cls.is_relation(key)
-                and not cls.is_mapped(key)
-            ):
-                continue
-            if not hasattr(value, "get"):
-                continue
-            all_items[key] = value
+        # reuse the same ownership order used by get_names_map()
+        # so flattened descriptors match the rest of the ORM
+        names_map = cls.get_names_map()
+        for key, owner in colony.legacy.iteritems(names_map):
+            owner_items = owner.get_items(foreign_relations=foreign_relations)
+            if key in owner_items:
+                all_items[key] = owner_items[key]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/structures.py` around lines 1813 - 1875,
get_all_items() currently merges parent and own items by assigning into
all_items such that later entries overwrite earlier ones (last-wins), which can
conflict with the ORM's owner precedence used by get_names_map() that preserves
the first owner; change the merge logic in get_all_items() (the loop over
cls.get_all_parents() and the subsequent own_items loop that writes into
all_items and sets setattr(cls, cache_key, all_items)) so that on key collisions
you preserve the existing owner/value (first-wins) rather than overwriting, and
ensure own_items still take precedence over parents by only writing a key from
own_items if it does not already exist or by writing parents first then own
items using a non-overwriting assign policy consistent with get_names_map().

Comment on lines +3063 to +3065
table_items = (
entity_class.get_all_items() if is_concrete else entity_class.get_items()
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Carry inherited secondary indexes onto concrete tables.

get_all_items() flattens parent columns into the concrete table here, but schema setup still calls index_fields(entity_class), and that method only indexes entity_class.get_indexed() for the current table. Under class-table that was table-scoped by necessity; with concrete-table, indexed parent fields on the child table are left unindexed, so inherited-field filters/orderings regress to table scans.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/system.py` around lines 3063 - 3065, The
concrete-table path flattens parent columns via entity_class.get_all_items() but
schema setup still only calls index_fields(entity_class) which in turn uses
entity_class.get_indexed(), so inherited indexed fields on the concrete table
are not registered; update the indexing logic so that for concrete tables (when
is_concrete and table_items comes from entity_class.get_all_items()) you call
the inherited-aware accessor (e.g. entity_class.get_all_indexed() or otherwise
merge parent indexed fields into entity_class.get_indexed()) before calling
index_fields(entity_class) so the child's table carries secondary indexes
defined on parents.

Comment on lines +4673 to 4697
if not is_concrete:
# iterates over all the parents to provide
# the necessary (inner) join of them into
# the current query context, this is a main step
# in achieving inheritance compliance in the query
for parent in all_parents:
# in case the parent class is abstract no need to join
# it into the current query
if parent.is_abstract():
continue

# retrieves the parent name, assumes the
# associated table has the same value
parent_name = parent.get_name()

# writes the table inheritance inner join
# part of the query, ensuring data coherence
# in the complete inheritance chain
query_buffer.write(" inner join ")
query_buffer.write(parent_name)
query_buffer.write(" on ")
query_buffer.write(table_name + "." + table_id)
query_buffer.write(" = ")
query_buffer.write(parent_name + "." + table_id)
# retrieves the parent name, assumes the
# associated table has the same value
parent_name = parent.get_name()

# writes the table inheritance inner join
# part of the query, ensuring data coherence
# in the complete inheritance chain
query_buffer.write(" inner join ")
query_buffer.write(parent_name)
query_buffer.write(" on ")
query_buffer.write(table_name + "." + table_id)
query_buffer.write(" = ")
query_buffer.write(parent_name + "." + table_id)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Inherited relation joins still point at parent aliases.

For concrete-table hierarchies, Lines 4673-4697 remove the parent tables from FROM, but join_tables() still builds _prefix from _entity_class.get_name() when the relation is declared on an ancestor. Eager/default loading of an inherited relation will emit predicates against Parent.* even though Parent is no longer joined.

🐛 Minimal fix
-                    if entity_class == _entity_class:
-                        _prefix = prefix or _table_name
-                    else:
-                        _prefix = (
-                            (prefix + "___" + _table_name) if prefix else _table_name
-                        )
+                    if entity_class.is_concrete_table():
+                        _prefix = prefix or entity_class.get_name()
+                    elif entity_class == _entity_class:
+                        _prefix = prefix or _table_name
+                    else:
+                        _prefix = (
+                            (prefix + "___" + _table_name) if prefix else _table_name
+                        )

Also applies to: 4771-4780

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/system.py` around lines 4673 - 4697, The joins for
inherited relations still use the ancestor class name as the SQL prefix, causing
predicates to reference Parent.* even after Parent is removed from FROM; update
join_tables() so that when a relation is declared on an ancestor it computes the
SQL prefix from the concrete/table-owning class (not
_entity_class.get_name())—e.g. resolve the concrete subclass/table name (or the
concrete table alias mapping used when building the FROM joins) and use that for
_prefix and any predicate generation; apply the same fix to the equivalent logic
referenced around lines 4771-4780 so inherited relations always target the
concrete table alias rather than the ancestor name.

Comment on lines +575 to +755
cti_fd, cti_path = tempfile.mkstemp(suffix=".db")
concrete_fd, concrete_path = tempfile.mkstemp(suffix=".db")
os.close(cti_fd)
os.close(concrete_fd)

try:
cti_conn = sqlite3.connect(cti_path)
concrete_conn = sqlite3.connect(concrete_path)

# benchmarks the schema creation
cti_ms, _ = benchmark("create_schema", create_cti_schema, cti_conn)
concrete_ms, _ = benchmark(
"create_schema", create_concrete_schema, concrete_conn
)
results.append(("Schema Creation", cti_ms, concrete_ms))

# benchmarks the bulk insert operation
def bulk_insert_cti():
for i in range(1, iterations + 1):
insert_cti(cti_conn, i, "person_%d" % i, 20 + (i % 50), 1000 + i)
cti_conn.commit()

def bulk_insert_concrete():
for i in range(1, iterations + 1):
insert_concrete(
concrete_conn, i, "person_%d" % i, 20 + (i % 50), 1000 + i
)
concrete_conn.commit()

cti_ms, _ = benchmark("bulk_insert", bulk_insert_cti)
concrete_ms, _ = benchmark("bulk_insert", bulk_insert_concrete)
results.append(("Bulk INSERT (%d rows)" % iterations, cti_ms, concrete_ms))

# benchmarks the single select by ID (warm cache, middle of range)
target_id = iterations // 2

def repeat_select_cti():
for _i in range(iterations):
select_by_id_cti(cti_conn, target_id)

def repeat_select_concrete():
for _i in range(iterations):
select_by_id_concrete(concrete_conn, target_id)

cti_ms, _ = benchmark("select_by_id", repeat_select_cti)
concrete_ms, _ = benchmark("select_by_id", repeat_select_concrete)
results.append(("SELECT by ID (%d lookups)" % iterations, cti_ms, concrete_ms))

# benchmarks the find all (full table scan)
def repeat_find_all_cti():
for _i in range(100):
select_all_cti(cti_conn)

def repeat_find_all_concrete():
for _i in range(100):
select_all_concrete(concrete_conn)

cti_ms, _ = benchmark("find_all", repeat_find_all_cti)
concrete_ms, _ = benchmark("find_all", repeat_find_all_concrete)
results.append(("SELECT all (100 scans)", cti_ms, concrete_ms))

# benchmarks the count operation
def repeat_count_cti():
for _i in range(iterations):
count_cti(cti_conn)

def repeat_count_concrete():
for _i in range(iterations):
count_concrete(concrete_conn)

cti_ms, _ = benchmark("count", repeat_count_cti)
concrete_ms, _ = benchmark("count", repeat_count_concrete)
results.append(("COUNT (%d counts)" % iterations, cti_ms, concrete_ms))

# benchmarks the bulk update operation
def bulk_update_cti():
for i in range(1, iterations + 1):
update_cti(cti_conn, i, "updated_%d" % i, 2000 + i)
cti_conn.commit()

def bulk_update_concrete():
for i in range(1, iterations + 1):
update_concrete(concrete_conn, i, "updated_%d" % i, 2000 + i)
concrete_conn.commit()

cti_ms, _ = benchmark("bulk_update", bulk_update_cti)
concrete_ms, _ = benchmark("bulk_update", bulk_update_concrete)
results.append(("Bulk UPDATE (%d rows)" % iterations, cti_ms, concrete_ms))

# benchmarks the bulk delete operation
def bulk_delete_cti():
for i in range(1, iterations + 1):
delete_cti(cti_conn, i)
cti_conn.commit()

def bulk_delete_concrete():
for i in range(1, iterations + 1):
delete_concrete(concrete_conn, i)
concrete_conn.commit()

cti_ms, _ = benchmark("bulk_delete", bulk_delete_cti)
concrete_ms, _ = benchmark("bulk_delete", bulk_delete_concrete)
results.append(("Bulk DELETE (%d rows)" % iterations, cti_ms, concrete_ms))

# closes the narrow benchmark connections
cti_conn.close()
concrete_conn.close()

# adds a separator for the wide hierarchy benchmarks
results.append(None)

# creates the temporary database files for the wide
# hierarchy benchmarks (20 columns per level, 60 total)
wide_cti_fd, wide_cti_path = tempfile.mkstemp(suffix=".db")
wide_concrete_fd, wide_concrete_path = tempfile.mkstemp(suffix=".db")
os.close(wide_cti_fd)
os.close(wide_concrete_fd)

wide_cti_conn = sqlite3.connect(wide_cti_path)
wide_concrete_conn = sqlite3.connect(wide_concrete_path)

# benchmarks the wide schema creation
cti_ms, _ = benchmark(
"wide_create_schema", create_wide_cti_schema, wide_cti_conn
)
concrete_ms, _ = benchmark(
"wide_create_schema", create_wide_concrete_schema, wide_concrete_conn
)
results.append(("Wide Schema Creation (60 cols)", cti_ms, concrete_ms))

# benchmarks the wide bulk insert
def wide_bulk_insert_cti():
for i in range(1, iterations + 1):
insert_wide_cti(wide_cti_conn, i)
wide_cti_conn.commit()

def wide_bulk_insert_concrete():
for i in range(1, iterations + 1):
insert_wide_concrete(wide_concrete_conn, i)
wide_concrete_conn.commit()

cti_ms, _ = benchmark("wide_bulk_insert", wide_bulk_insert_cti)
concrete_ms, _ = benchmark("wide_bulk_insert", wide_bulk_insert_concrete)
results.append(
("Wide INSERT (%d rows, 60 cols)" % iterations, cti_ms, concrete_ms)
)

# benchmarks the wide select by ID
target_id = iterations // 2

def wide_repeat_select_cti():
for _i in range(iterations):
select_by_id_wide_cti(wide_cti_conn, target_id)

def wide_repeat_select_concrete():
for _i in range(iterations):
select_by_id_wide_concrete(wide_concrete_conn, target_id)

cti_ms, _ = benchmark("wide_select_by_id", wide_repeat_select_cti)
concrete_ms, _ = benchmark("wide_select_by_id", wide_repeat_select_concrete)
results.append(
("Wide SELECT by ID (%d, 60 cols)" % iterations, cti_ms, concrete_ms)
)

# closes the wide benchmark connections
wide_cti_conn.close()
wide_concrete_conn.close()

if os.path.exists(wide_cti_path):
os.remove(wide_cti_path)
if os.path.exists(wide_concrete_path):
os.remove(wide_concrete_path)

finally:
# removes the temporary database files
if os.path.exists(cti_path):
os.remove(cti_path)
if os.path.exists(concrete_path):
os.remove(concrete_path)

return results
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make temp database cleanup exception-safe.

If any benchmark step raises before the explicit close() calls, the finally block tries to delete still-open SQLite files. That can raise on Windows and mask the real failure, and the wide temp databases leak on early exits.

♻️ Suggested cleanup structure
 def run_benchmarks(iterations=None):
@@
-    try:
-        cti_conn = sqlite3.connect(cti_path)
-        concrete_conn = sqlite3.connect(concrete_path)
+    cti_conn = None
+    concrete_conn = None
+    wide_cti_conn = None
+    wide_concrete_conn = None
+    wide_cti_path = None
+    wide_concrete_path = None
+
+    try:
+        cti_conn = sqlite3.connect(cti_path)
+        concrete_conn = sqlite3.connect(concrete_path)
@@
-        # closes the wide benchmark connections
-        wide_cti_conn.close()
-        wide_concrete_conn.close()
-
-        if os.path.exists(wide_cti_path):
-            os.remove(wide_cti_path)
-        if os.path.exists(wide_concrete_path):
-            os.remove(wide_concrete_path)
-
     finally:
-        # removes the temporary database files
-        if os.path.exists(cti_path):
-            os.remove(cti_path)
-        if os.path.exists(concrete_path):
-            os.remove(concrete_path)
+        for connection in (
+            wide_cti_conn,
+            wide_concrete_conn,
+            cti_conn,
+            concrete_conn,
+        ):
+            if connection:
+                connection.close()
+        for path in (
+            wide_cti_path,
+            wide_concrete_path,
+            cti_path,
+            concrete_path,
+        ):
+            if path and os.path.exists(path):
+                os.remove(path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_inheritance.py` around lines 575 - 755, The cleanup
currently deletes temporary DB files in the outer finally without guaranteeing
the SQLite connections (cti_conn, concrete_conn, wide_cti_conn,
wide_concrete_conn) are closed, which can fail on Windows and leak wide DB
files; modify the function to track these connection objects and paths
(cti_conn, concrete_conn, wide_cti_conn, wide_concrete_conn and cti_path,
concrete_path, wide_cti_path, wide_concrete_path), and in the finally block
first safely close any non-None/open connections (catch and ignore errors on
close) and then remove their corresponding files if they exist; ensure wide_*
variables are defined (init to None) before use so early exceptions still
trigger proper closure and deletion.

Comment on lines +70 to +80
def _collect(cls):
for subclass in cls.__subclasses__():
# avoids duplicates from multiple inheritance paths
# (e.g. Employee inherits from both Person and Taxable)
if subclass in classes:
continue
if not subclass.is_abstract():
classes.append(subclass)
_collect(subclass)

_collect(entity_class)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "migrate_inheritance.py" -type f

Repository: hivesolutions/colony-plugins

Length of output: 104


🏁 Script executed:

wc -l scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 106


🏁 Script executed:

sed -n '65,85p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 653


🏁 Script executed:

sed -n '820,840p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 646


🏁 Script executed:

# Get context around the function definitions
sed -n '1,100p' scripts/migrate_inheritance.py | head -100

Repository: hivesolutions/colony-plugins

Length of output: 3114


🏁 Script executed:

sed -n '825,835p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 437


🏁 Script executed:

# Get broader context around the main execution
sed -n '800,870p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 2019


🏁 Script executed:

# Check for any imports or code that discovers/loads all subclasses
rg "__subclasses__\|import.*subclass" scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 54


🏁 Script executed:

# Look for how the module is imported and if there are any mechanisms to discover all subclasses
sed -n '800,840p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 1196


🏁 Script executed:

# Get the full picture around the migrate function call and setup
sed -n '700,870p' scripts/migrate_inheritance.py | tail -100

Repository: hivesolutions/colony-plugins

Length of output: 2882


🏁 Script executed:

# Look for the migrate function definition
sed -n '1,400p' scripts/migrate_inheritance.py | grep -A 30 "^def migrate"

Repository: hivesolutions/colony-plugins

Length of output: 54


🏁 Script executed:

# Check if there are any warnings or validations for incomplete hierarchies
rg "hierarchy|subclass|import" scripts/migrate_inheritance.py -A 2 -B 2

Repository: hivesolutions/colony-plugins

Length of output: 7384


🏁 Script executed:

# Check if there's any code that ensures all subclasses are imported before calling get_hierarchy_classes
sed -n '200,400p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 7321


🏁 Script executed:

# Look at the migrate function definition more completely
rg "^def migrate" scripts/migrate_inheritance.py -A 50

Repository: hivesolutions/colony-plugins

Length of output: 1755


🏁 Script executed:

# Check if there are any comments or docstrings mentioning the need to import subclasses
rg "import.*subclass|all.*subclass|discover.*class" scripts/migrate_inheritance.py -B 2 -A 2

Repository: hivesolutions/colony-plugins

Length of output: 54


🏁 Script executed:

# Look for any mechanism that might ensure subclasses are imported
grep -n "import\|__import__" scripts/migrate_inheritance.py | head -20

Repository: hivesolutions/colony-plugins

Length of output: 317


🏁 Script executed:

# Check if there's any documentation or comments about this limitation
sed -n '1,100p' scripts/migrate_inheritance.py | grep -i "import\|subclass"

Repository: hivesolutions/colony-plugins

Length of output: 364


🏁 Script executed:

# Get the complete main function and surrounding code to understand the full flow
sed -n '760,870p' scripts/migrate_inheritance.py

Repository: hivesolutions/colony-plugins

Length of output: 3204


Don't rely on __subclasses__() alone to discover the complete hierarchy.

The CLI only imports the root class module via __import__(). If subclasses reside in separate modules and haven't been imported yet, __subclasses__() will silently discover only the imported portion of the hierarchy, resulting in an incomplete migration plan that skips tables from unimported classes.

Also applies to: 828-831

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 70 - 80, The current _collect
recursion relies on __subclasses__() which only reflects already-imported
modules; before calling _collect(entity_class) ensure all modules that could
define subclasses of entity_class are imported (e.g., use
pkgutil.walk_packages()/importlib.import_module over the package containing
entity_class.__module__ or otherwise import relevant modules), then run the
existing _collect function that appends to classes; keep the same symbols
(_collect, classes, entity_class, __subclasses__()) but add the module
discovery/import step prior to invoking _collect so the full class hierarchy is
loaded.

Comment on lines +159 to +163
if engine == "sqlite":
file_path = connection_params.get("file_path", "")
backup_path = "%s.backup.%d" % (file_path, timestamp)
shutil.copy2(file_path, backup_path)
return backup_path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, read the actual code at lines 159-163
head -170 scripts/migrate_inheritance.py | tail -20

Repository: hivesolutions/colony-plugins

Length of output: 785


🏁 Script executed:

# Check for any WAL checkpoint calls in the backup function and surrounding code
rg "wal_checkpoint|WAL|backup" scripts/migrate_inheritance.py -B 3 -A 3

Repository: hivesolutions/colony-plugins

Length of output: 3645


🏁 Script executed:

# Look for test files and their checkpoint patterns
fd "test.*\.py" scripts/

Repository: hivesolutions/colony-plugins

Length of output: 54


🏁 Script executed:

# Check if there are any tests that mention wal_checkpoint or backup
rg "wal_checkpoint|FULL.*backup" . -t py -B 2 -A 2

Repository: hivesolutions/colony-plugins

Length of output: 1419


Call PRAGMA wal_checkpoint(FULL) before copying the SQLite database file to ensure WAL data is flushed.

With WAL enabled, recent transactions can live in the -wal sidecar file, and shutil.copy2() on only the main database file will produce an incomplete backup. The tests already demonstrate awareness of this (they call wal_checkpoint(FULL) in multiple places), but the backup_database() function should apply the same safety measure before performing the file copy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 159 - 163, The backup path code
for SQLite does not flush WAL to the main DB before copying, so modify the
sqlite branch (the block that checks if engine == "sqlite" and uses
connection_params/file_path) to open a sqlite3 connection to file_path (or use
the existing connection if available) and execute "PRAGMA wal_checkpoint(FULL)"
(via cursor.execute) and commit/close before calling shutil.copy2; then perform
the existing backup_path creation and copy as before so the backup includes WAL
changes.

Comment on lines +707 to +725
# validates the hierarchy first
is_valid, validation_messages = validate_hierarchy(entity_class, target_strategy)
messages.extend(validation_messages)

if not is_valid:
return False, messages

if validate_only:
return True, messages

# determines the current strategy and generates the
# appropriate migration queries
current_strategy = entity_class.get_inheritance_strategy()

if current_strategy == "class_table" and target_strategy == "concrete_table":
queries = generate_cti_to_concrete_queries(entity_class, types_map, connection)
elif current_strategy == "concrete_table" and target_strategy == "class_table":
queries = generate_concrete_to_cti_queries(entity_class, types_map, connection)
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Run data validation before declaring the migration safe.

migrate() returns after validate_hierarchy() and never calls validate_data(). That means orphaned CTI rows will be reported as valid and can then disappear during the INNER JOIN copy in generate_cti_to_concrete_queries().

🛡️ Suggested validation hook
     if not is_valid:
         return False, messages
-
-    if validate_only:
-        return True, messages
 
     # determines the current strategy and generates the
     # appropriate migration queries
     current_strategy = entity_class.get_inheritance_strategy()
+
+    if current_strategy == "class_table":
+        is_valid = validate_data(connection, entity_class, messages)
+        if not is_valid:
+            return False, messages
+
+    if validate_only:
+        return True, messages
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 707 - 725, The migrate() flow
currently stops after validate_hierarchy() and skips validate_data(), so run
validate_data() after a successful hierarchy validation and before proceeding to
build queries (i.e., after the existing is_valid check and before the
current_strategy logic); if validate_data() returns failures, append its
messages to messages and return False, messages, and if validate_only is true
return True, messages only when both validate_hierarchy() and validate_data()
succeed; ensure you call validate_data(entity_class, connection, types_map) (or
the appropriate signature) and propagate its results before calling
generate_cti_to_concrete_queries() or generate_concrete_to_cti_queries().

…support

Move all migration tests out of the Colony entity manager test bundle
into scripts/test/test_migrate_inheritance.py as a self-contained
test suite with its own mock entity classes and raw SQLite setup.
Remove EntityManagerMigrationTestCase from entity_manager/test.py.

Make the migration script engine-agnostic with helper functions:
- has_table(): engine-specific catalog queries (sqlite_master,
  information_schema.tables, pg_tables)
- index_query(): engine-specific CREATE INDEX syntax (SQLite plain,
  MySQL USING after column, PostgreSQL USING before column) with
  name truncation (64 chars MySQL, 63 chars PostgreSQL)
- insert_ignore_query(): engine-specific duplicate handling
  (INSERT OR IGNORE for SQLite, INSERT IGNORE for MySQL,
  ON CONFLICT DO NOTHING for PostgreSQL)

CI now uses pytest to run all scripts/test/ files:
  pip install pytest
  pytest scripts/test/ -v

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
data/src/entity_manager/test.py (1)

2516-2518: Consider consistent save order for related entities.

The save order here (person saved before address when person.address = address) differs from the pattern established in the base test_one_to_one (lines 361-372), where the referenced entity is saved first. While this may work correctly, maintaining consistency across tests helps with readability and maintainability.

🔧 Suggested fix to match established pattern
 person.address = address
-self.entity_manager.save(person)
 self.entity_manager.save(address)
+self.entity_manager.save(person)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/src/entity_manager/test.py` around lines 2516 - 2518, The test saves
person before address after setting person.address, which is inconsistent with
the established pattern in test_one_to_one; update the save order so the
referenced entity is persisted first (call self.entity_manager.save(address)
before self.entity_manager.save(person)) after assigning person.address =
address to match the base test pattern and improve readability and
maintainability.
scripts/migrate_inheritance.py (3)

486-489: Prefix unused loop variables with underscore.

sql_type and is_pk are unpacked but not used. Prefix them with underscore to indicate intentionally unused.

♻️ Proposed fix
-        for col_name, sql_type, is_pk in columns:
+        for col_name, _sql_type, _is_pk in columns:
             source_table = get_source_table_for_column(cls, col_name)
             select_cols.append("%s.%s" % (source_table, col_name))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 486 - 489, The loop unpacks
unused variables sql_type and is_pk which should be prefixed to signal
intentional non-use; modify the for-loop over columns to unpack as (col_name,
_sql_type, _is_pk) and leave the rest of the logic (calls to
get_source_table_for_column(cls, col_name) and appending to select_cols)
unchanged so the variables and intent are clear.

404-408: SQL built via string formatting - ensure table/column names are trusted.

The query construction uses string formatting with table_name, table_id, and parent_table. While these values derive from entity class metadata (not user input), this pattern doesn't use parameterized queries for identifiers. This is acceptable since SQL doesn't support parameterized table/column names, but ensure entity class names are always code-defined and never user-controlled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 404 - 408, The SQL is built via
Python string formatting in cursor.execute using table_name, table_id, and
parent_table; replace this with a safe identifier composition (e.g.,
psycopg2.sql.SQL/Identifier) or assert/validate that these values are strictly
code-defined entity names (no user input) before use. Locate the cursor.execute
call in migrate_inheritance.py and either (1) rebuild the statement using
psycopg2.sql.Identifier for table/column identifiers or (2) add explicit
validation (regex whitelist or membership check against known entity metadata)
for table_name, table_id and parent_table to guarantee they cannot be
user-controlled, then use them in the query.

847-853: Consider logging when rollback fails.

The try-except-pass pattern on the rollback silently swallows any rollback failure. While avoiding a cascade of exceptions is reasonable, knowing that rollback failed is critical for debugging and manual recovery.

♻️ Proposed fix
     except Exception as exception:
         # rolls back the transaction on failure
         try:
             connection.rollback()
         except Exception:
-            pass
+            messages.append("warning: rollback also failed")
         messages.append("migration failed: %s" % str(exception))
         messages.append("transaction rolled back")
         return False, messages
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 847 - 853, The rollback
exception is being swallowed by a bare except in the block around
connection.rollback; catch the exception from connection.rollback (e.g., as
rollback_exc) and record/log it instead of pass so failures are visible—update
the try/except around connection.rollback to append a message like "rollback
failed: %s" % str(rollback_exc) to messages (or call the existing logger if
available) before continuing, and keep the existing "transaction rolled back"
and return behavior; reference the connection.rollback call and the messages
list and exception variable in your change.
scripts/test/test_migrate_inheritance.py (1)

102-105: Use is for class identity comparison.

When comparing class objects, is is more appropriate than == since it checks identity rather than equality.

♻️ Proposed fix
     def _get_entity_parents(cls):
         parents = []
         for base in cls.__bases__:
-            if base == MockEntityBase or base == object:
+            if base is MockEntityBase or base is object:
                 continue
             parents.append(base)
         return parents
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test/test_migrate_inheritance.py` around lines 102 - 105, In the loop
that iterates over cls.__bases__ (variables cls and base), replace the equality
checks using "==" with identity checks using "is" when comparing against
MockEntityBase and object (i.e., use "base is MockEntityBase" and "base is
object"); update any similar class comparisons in the same function to use "is"
to ensure class identity semantics are used rather than equality.
.github/workflows/main.yml (1)

70-77: Consider adding pytest to PyPy builds (excluding 2.7).

The PyPy build jobs don't include the new pytest step. While PyPy 2.7 should be excluded, the migration tests should work with PyPy 3.x versions when they're uncommented in the matrix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/main.yml around lines 70 - 77, Add a pytest invocation to
the PyPy job so migration tests run on PyPy 3.x: modify the job steps that
currently include the "run: pypy setup.py test" step to first install and run
pytest (e.g., "pip install pytest" then "pytest") but guard it with the matrix
condition (matrix.python-version != '2.7') so PyPy 2.7 is excluded; update the
existing "run: pypy setup.py test" step or add a preceding step that references
matrix.python-version and the existing "pypy setup.py test" entry to ensure
tests run under PyPy 3.x.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/migrate_inheritance.py`:
- Around line 116-121: The current truncation uses index_name =
index_name[-limit:] which slices the tail and can collide for similar suffixes;
modify the logic around INDEX_NAME_LIMITS handling to produce deterministic
unique names by combining a preserved prefix with a short hash of the full
index_name: compute a stable hash (e.g., md5/sha1) of the original "%s_%s_%s" %
(table_name, attribute_name, index_type), then build index_name as prefix + "_"
+ hash_suffix such that len(index_name) <= limit (choose prefix length = limit -
1 - len(hash_suffix)); update the code that reads/writes index_name and ensure
engine, table_name, attribute_name and index_type are used to compute the hash
to guarantee uniqueness across similar-suffixed names.

---

Nitpick comments:
In @.github/workflows/main.yml:
- Around line 70-77: Add a pytest invocation to the PyPy job so migration tests
run on PyPy 3.x: modify the job steps that currently include the "run: pypy
setup.py test" step to first install and run pytest (e.g., "pip install pytest"
then "pytest") but guard it with the matrix condition (matrix.python-version !=
'2.7') so PyPy 2.7 is excluded; update the existing "run: pypy setup.py test"
step or add a preceding step that references matrix.python-version and the
existing "pypy setup.py test" entry to ensure tests run under PyPy 3.x.

In `@data/src/entity_manager/test.py`:
- Around line 2516-2518: The test saves person before address after setting
person.address, which is inconsistent with the established pattern in
test_one_to_one; update the save order so the referenced entity is persisted
first (call self.entity_manager.save(address) before
self.entity_manager.save(person)) after assigning person.address = address to
match the base test pattern and improve readability and maintainability.

In `@scripts/migrate_inheritance.py`:
- Around line 486-489: The loop unpacks unused variables sql_type and is_pk
which should be prefixed to signal intentional non-use; modify the for-loop over
columns to unpack as (col_name, _sql_type, _is_pk) and leave the rest of the
logic (calls to get_source_table_for_column(cls, col_name) and appending to
select_cols) unchanged so the variables and intent are clear.
- Around line 404-408: The SQL is built via Python string formatting in
cursor.execute using table_name, table_id, and parent_table; replace this with a
safe identifier composition (e.g., psycopg2.sql.SQL/Identifier) or
assert/validate that these values are strictly code-defined entity names (no
user input) before use. Locate the cursor.execute call in migrate_inheritance.py
and either (1) rebuild the statement using psycopg2.sql.Identifier for
table/column identifiers or (2) add explicit validation (regex whitelist or
membership check against known entity metadata) for table_name, table_id and
parent_table to guarantee they cannot be user-controlled, then use them in the
query.
- Around line 847-853: The rollback exception is being swallowed by a bare
except in the block around connection.rollback; catch the exception from
connection.rollback (e.g., as rollback_exc) and record/log it instead of pass so
failures are visible—update the try/except around connection.rollback to append
a message like "rollback failed: %s" % str(rollback_exc) to messages (or call
the existing logger if available) before continuing, and keep the existing
"transaction rolled back" and return behavior; reference the connection.rollback
call and the messages list and exception variable in your change.

In `@scripts/test/test_migrate_inheritance.py`:
- Around line 102-105: In the loop that iterates over cls.__bases__ (variables
cls and base), replace the equality checks using "==" with identity checks using
"is" when comparing against MockEntityBase and object (i.e., use "base is
MockEntityBase" and "base is object"); update any similar class comparisons in
the same function to use "is" to ensure class identity semantics are used rather
than equality.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ddc6489-533c-492d-82d5-f4ec1e4185ae

📥 Commits

Reviewing files that changed from the base of the PR and between 47dbdbb and f8e030e.

📒 Files selected for processing (4)
  • .github/workflows/main.yml
  • data/src/entity_manager/test.py
  • scripts/migrate_inheritance.py
  • scripts/test/test_migrate_inheritance.py

Comment on lines +116 to +121
index_name = "%s_%s_%s" % (table_name, attribute_name, index_type)

# truncates the index name to the engine-specific limit
limit = INDEX_NAME_LIMITS.get(engine, None)
if limit:
index_name = index_name[-limit:]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Index name truncation from the start may cause collisions.

Using index_name[-limit:] takes the last N characters, which could produce duplicate names when multiple tables share similar suffixes. For example, table_a_very_long_attr_hash and table_b_very_long_attr_hash truncated to 64 chars would both end with the same suffix.

Consider truncating from the middle or using a hash-based suffix to ensure uniqueness.

♻️ Proposed fix using hash for uniqueness
+import hashlib
+
 def index_query(table_name, attribute_name, index_type, engine="sqlite"):
     ...
     index_name = "%s_%s_%s" % (table_name, attribute_name, index_type)
 
     # truncates the index name to the engine-specific limit
     limit = INDEX_NAME_LIMITS.get(engine, None)
-    if limit:
-        index_name = index_name[-limit:]
+    if limit and len(index_name) > limit:
+        # use a hash suffix to ensure uniqueness when truncating
+        name_hash = hashlib.md5(index_name.encode()).hexdigest()[:8]
+        prefix_len = limit - 9  # leave room for underscore + 8-char hash
+        index_name = index_name[:prefix_len] + "_" + name_hash
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/migrate_inheritance.py` around lines 116 - 121, The current
truncation uses index_name = index_name[-limit:] which slices the tail and can
collide for similar suffixes; modify the logic around INDEX_NAME_LIMITS handling
to produce deterministic unique names by combining a preserved prefix with a
short hash of the full index_name: compute a stable hash (e.g., md5/sha1) of the
original "%s_%s_%s" % (table_name, attribute_name, index_type), then build
index_name as prefix + "_" + hash_suffix such that len(index_name) <= limit
(choose prefix length = limit - 1 - len(hash_suffix)); update the code that
reads/writes index_name and ensure engine, table_name, attribute_name and
index_type are used to compute the hash to guarantee uniqueness across
similar-suffixed names.

- Update debug, info, warning, error, critical methods to accept
  *args and **kwargs for deferred string formatting
- Update class-level logging methods (_class_debug, etc.) with
  the same signature change
- Enables logging calls like self.debug("Event %s", name) where
  formatting is skipped if the log level is not active

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add Concrete Table Inheritance (CTI per-class) as an alternative inheritance strategy

2 participants