Skip to content

Add PostgreSQL support to core analysis classes#21

Merged
antarr merged 10 commits into
mainfrom
claude/add-postgresql-support-eEEGJ
May 30, 2026
Merged

Add PostgreSQL support to core analysis classes#21
antarr merged 10 commits into
mainfrom
claude/add-postgresql-support-eEEGJ

Conversation

@antarr

@antarr antarr commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

This PR adds full PostgreSQL support to the mysql_genius-core analysis layer, enabling the dashboard to work against PostgreSQL servers in addition to MySQL/MariaDB. Dialect detection is automatic and transparent to callers.

Key Changes

Architecture

  • Dialect-aware query builders — New QueryBuilders factory module dispatches to Mysql or Postgresql builders based on connection.server_version.dialect. Each builder is a stateless module exposing class methods that return raw SQL strings, with a stable column-name contract so downstream transformation logic remains dialect-agnostic.
  • Dialect-specific analysis implementationsServerOverview now dispatches to nested Mysql or Postgresql classes; other analyses (TableSizes, QueryStats, UnusedIndexes, StatsCollector) delegate SQL generation to the appropriate builder.

New Files

  • lib/mysql_genius/core/query_builders.rb — Factory and interface definition
  • lib/mysql_genius/core/query_builders/mysql.rb — MySQL/MariaDB SQL generation (extracted from existing analysis classes)
  • lib/mysql_genius/core/query_builders/postgresql.rb — PostgreSQL SQL generation using pg_stat_statements, pg_stat_activity, pg_class, and pg_settings

Modified Analysis Classes

  • ServerOverview — Refactored to dispatch to Mysql or Postgresql nested classes; PostgreSQL implementation populates the innodb block with shared_buffers and buffer cache hit rate (blks_hit/blks_read ratio) so the existing UI continues to render.
  • QueryStats — Uses builder to generate SQL; transforms both MySQL performance_schema.events_statements_summary_by_digest and PostgreSQL pg_stat_statements rows into a uniform output schema.
  • TableSizes — Uses builder to query information_schema.tables (MySQL) or pg_class + pg_total_relation_size (PostgreSQL).
  • UnusedIndexes — Uses builder to query performance_schema.table_io_waits_summary_by_index_usage (MySQL) or pg_stat_user_indexes (PostgreSQL); generates dialect-appropriate DROP INDEX statements.
  • StatsCollector — Uses builder for snapshot SQL generation.
  • QueryRunner — Timeout strategy now dialect-aware: MySQL uses /*+ MAX_EXECUTION_TIME(ms) */ hint, MariaDB uses SET STATEMENT max_statement_time=s FOR, PostgreSQL uses SET statement_timeout = ms with reset in ensure block.

Supporting Changes

  • ServerInfo — Added #dialect method that collapses :mysql and :mariadb vendors into :mysql dialect, and :postgresql vendor into :postgresql dialect. Enhanced #parse to recognize PostgreSQL version strings (case-insensitive).
  • SqlValidator — Added POSTGRESQL_SYSTEM_SCHEMAS constant; validation now checks the appropriate schema list based on dialect.
  • FakeAdapter — Updated quote_table_name to use double quotes for PostgreSQL, backticks for MySQL; added PostgreSQL version string detection.

Test Coverage

  • New test suite spec/mysql_genius/core/query_builders_spec.rb validates builder dispatch and SQL generation for both dialects.
  • Existing analysis specs extended with PostgreSQL context blocks that stub pg_stat_statements, pg_stat_activity, pg_class, and pg_settings queries.
  • QueryRunner, SqlValidator, and ServerInfo specs updated to cover PostgreSQL-specific behavior.

Notable Implementation Details

  • PostgreSQL pg_stat_statements requires the extension to be installed; if unavailable, the query raises just as it does on MySQL when performance_schema is disabled.
  • PostgreSQL schema/table identifiers use double quotes; MySQL uses backticks. The builders and FakeAdapter handle this transparently.
  • Lock-related fields in the innodb block (e.g., row_lock_waits) fall back to 0 on PostgreSQL since there is no direct equivalent of InnoDB row lock waits.
  • Fragmentation metrics

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE

claude added 9 commits May 18, 2026 02:51
Introduces a dialect-aware QueryBuilders layer (Mysql + Postgresql) so
the existing analysis classes — TableSizes, QueryStats, UnusedIndexes,
ServerOverview, StatsCollector — run unchanged against either MySQL,
MariaDB, or PostgreSQL. Dialect selection is automatic via
Core::ServerInfo, which now recognises :postgresql as a vendor.

Beyond the SQL: SqlValidator blocks the right system schemas for each
dialect and parses double-quoted identifiers; QueryRunner enforces
statement timeouts via SET statement_timeout on PostgreSQL with an
ensure-block reset, and recognises PG's cancellation message as a
timeout error; FakeAdapter quotes identifiers with double-quotes when
stubbed as PostgreSQL so tests exercise the right dialect path.

PostgreSQL query stats require the pg_stat_statements extension; the
ServerOverview innodb block is repurposed to hold the closest PG
analogues (shared_buffers, blks_hit/blks_read ratio, deadlocks).
MySQL-only AI features (InnoDB status, slow log capture) remain
unchanged and are not exposed on PostgreSQL connections.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
Follow-up to the core gem PG support: the dashboard tabs visibly broke
when pointed at a PG database because several Rails-side controllers,
views, and AI features still emit MySQL-only SQL (performance_schema
reads, SHOW commands, backticked ALTER TABLE DROP INDEX). This pass
extends the dialect-aware layer across the rest of the stack.

Adds:
- Core::Analysis::QueryHistory — single-digest stats lookup used by the
  query detail page; reads pg_stat_statements on PG.
- Core::Analysis::DuplicateIndexes now emits drop_sql in each result
  hash via the dialect builder, so the dashboard JS no longer assembles
  ALTER TABLE statements client-side.
- Core::UnsupportedDialect error class; raised by VariableReviewer,
  ConnectionAdvisor, and InnodbInterpreter when called on a PG
  connection so the UI gets a clear MySQL-only message instead of a
  raw PG::SyntaxError.

Refactors:
- queries_controller.rb#query_history delegates to Core::Analysis::
  QueryHistory rather than inlining performance_schema SQL.
- database_analysis.rb's friendly error messages pick the right stats
  source name based on dialect (pg_stat_statements vs performance_schema).
- ai_features.rb gates root_cause and anomaly_detection — both run
  performance_schema/SHOW queries directly — behind a connected_to_
  postgresql? guard that returns a JSON "MySQL/MariaDB-only" error.
- ActiveRecordAdapter#server_version memoizes; dialect detection now
  costs one SELECT VERSION() per request rather than one per analysis.
- Cosmetic tab headers no longer mention performance_schema /
  information_schema by name.

Test plumbing:
- FakeConnectionHelper default-stubs SELECT VERSION() to "8.0.35" and
  routes unstubbed select_value calls through the exec_query matchers,
  so request specs don't need to grow per-controller version stubs.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
The split was originally added to support a planned desktop app
(mysql_genius-desktop). With that direction discontinued, the two-gem
layout creates friction with no payoff — lockstep version bumps, two
gemspecs/Gemfiles/CHANGELOGs/test suites, and host apps having to
declare a path dependency on both gems during development (which the
PG branch surfaced when several controllers initially appeared not to
load the new dialect-aware code).

What moved:
- gems/mysql_genius-core/lib/mysql_genius/core/** → lib/mysql_genius/core/**
- gems/mysql_genius-core/spec/** → spec/mysql_genius/core/**
- Views under core/views/ continue to live where Core.views_path expects.
- Git rename detection caught 89 of these so file history is preserved.

What was removed:
- gems/mysql_genius-core/ entirely (gemspec, Gemfile, Rakefile, CHANGELOG).
- gems/mysql_genius-desktop/ entirely (the stub that justified the split).
- packaging/macos/ (build-dmg.sh + icon generator that only packaged the
  desktop gem).
- mysql_genius.gemspec's runtime dependency on `mysql_genius-core`.
- The path declaration in the top-level Gemfile.
- Makefile's test-core/test-desktop/lint-core/lint-desktop/sidecar targets.
- CI's separate `desktop` job and the publish workflow's separate
  mysql_genius-core build/push step.
- The unused `MysqlGenius::Core::VERSION` constant. Use MysqlGenius::VERSION.

What stayed identical:
- The `MysqlGenius::Core::*` namespace and every public require path
  (`require "mysql_genius/core/..."`). No host-app code changes needed.
- Single rspec run from the repo root now executes both formerly-split
  suites (401 examples, was 94 + 307).

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
A user reported ~231 indexes flagged as unused on a real PG database,
basically every non-PK/non-unique index. Investigation: idx_scan is
cumulative since the stats source was last reset, so a recent PG
restart or pg_stat_reset() makes everything read as "0 scans" until
real traffic accumulates. We were emitting a bare JSON array of
matches, with no context for the user to tell "truly unused" apart
from "stats are fresh."

Matches PgHero's approach:

- The route now returns { indexes, stats_reset_at, min_scans } so the
  dashboard can render "Stats last reset 2h ago · threshold: scans
  ≤ 0" above the table. `stats_reset_at` comes from
  pg_stat_database.stats_reset on PG and is nil on MySQL (whose
  table_io_waits counters don't surface a reset time cheaply).

- New config.min_unused_index_scans (default 0 to match PgHero). Both
  builders thread it through to idx_scan <= N / COUNT_READ <= N so
  users can ignore indexes that are technically used but rarely
  enough to be drop candidates.

- PG results now sort by pg_relation_size(indexrelid) DESC and carry
  a size_bytes field; the table shows a Size column with the biggest
  candidates at the top (where most wins live). MySQL emits a nil
  size_bytes — individual index size isn't cheap to surface from
  information_schema.STATISTICS.

- Dropped the `c.reltuples > 0` filter on PG. PgHero doesn't have it
  and an empty table with unused indexes is still actionable noise.

The view + count widget were updated to read data.indexes; existing
JSON-array consumers will need the small shape change but that's
called out in CHANGELOG. IndexPlanner (which consumed UnusedIndexes
directly) now pulls .indexes off the Result struct.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
The dashboard JS ROUTES.query_detail was hardcoded to '/queries/'
without the engine mount prefix, so query stats rows rendered
<a href="/queries/${digest}"> instead of "/mysql_genius/queries/...",
404'ing in host apps. Every other route in the ROUTES table goes
through path_for(...) which generates engine-aware paths; query_detail
was special-cased because path_for(:query_detail) requires a digest
param it doesn't have on the dashboard page.

Switching to '<%= mysql_genius.root_path %>queries/' picks up the
mount point from the engine's URL helpers without needing a placeholder
digest. Added a regression spec on GET /mysql_genius/ that asserts the
rendered prefix is /mysql_genius/queries/, not /queries/.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
A user hit `NameError: undefined local variable or method 'mysql_genius'`
in production when first rendering the dashboard. The `mysql_genius`
URL helpers proxy is supposed to be injected into controllers/views
when the engine is mounted, but in production with eager loading the
proxy isn't always available in time — the failure surfaced on the
first path_for() call.

The robust fix is to bypass the proxy and reference the engine's URL
helpers module directly:

    MysqlGenius::Engine.routes.url_helpers.public_send("#{name}_path")

This always resolves once the engine routes are drawn, with no
dependency on Rails injecting `mysql_genius` onto each controller
instance.

Also folds the query_detail link prefix into path_for as a named
`:query_detail_prefix` lookup, replacing the inline `mysql_genius.root_path`
ERB from the previous commit (which would have hit the same proxy
problem).

The existing regression spec from c7c350f already asserts the rendered
JS reads `query_detail: '/mysql_genius/queries/'`, so it now also
guards this fix.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
EXPLAIN on captured digest text was failing on PostgreSQL because
pg_stat_statements normalizes literals to $1, $2, ... bind
placeholders, and PG can't plan a parameterized query without bind
values ("there is no parameter $1"). MySQL has the same issue with ?
in performance_schema digest text but the PG report was the visible
one.

QueryExplainer now rewrites placeholders to NULL before wrapping the
SQL in EXPLAIN:

- PG: $N → NULL (unambiguous; $N doesn't appear in hand-written SQL
  outside dollar-quoted strings, which are an edge case)
- MySQL: ? → NULL, but only outside single-quoted string literals,
  so user-typed `WHERE name = '?'` from the Query Explorer is left
  alone. The walker also handles escaped quotes ('').

Tradeoff: selectivity-dependent plan choices (e.g. index scan vs.
seq scan based on estimated row counts) may differ from production,
but the shape of the plan — join order, indexes used, scan types —
is still useful for diagnostics. A future enhancement could prefer
EXPLAIN (GENERIC_PLAN) on PG 16+ for a more accurate plan; for now
NULL substitution works on every supported PG version.

The substitution happens BEFORE the existing truncation heuristic,
since a bare `?` doesn't match the "value-like trailing char" pattern
and would otherwise short-circuit as Truncated.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
OpenAI's GPT-5 and o-series ("reasoning") models renamed max_tokens
to max_completion_tokens and only accept the default temperature (1).
Sending the legacy params returns:

    Unsupported parameter: 'max_tokens' is not supported with this
    model. Use 'max_completion_tokens' instead.

Detect the model family from @config.model and pick the right shape:

- Legacy chat models (gpt-4o, gpt-4, gpt-3.5, etc.):
    body[:max_tokens]    = N
    body[:temperature]   = T
- Reasoning models (gpt-5*, o1*, o3*, o4*):
    body[:max_completion_tokens] = N
    (no temperature — default is 1, anything else is rejected)

The regex `\b(gpt-5|o1|o3|o4)(-|\b)/i` matches the bare model name
(`gpt-5`, `o1`, `o3-mini`, `o4`) plus the common variants
(`gpt-5-mini`, `o1-mini`, `o3-pro`). gpt-4o is *not* matched (the
trailing `o` doesn't sit on a word boundary the way `o1` does), so
the existing default model keeps working unchanged.

Anthropic and the custom callable path are untouched.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
Four issues surfaced in actual PG dashboard use:

1. EXPLAIN false-positive on Rails query annotations
   SQL like `SELECT COUNT(*) FROM "taggings" /*action='table_sizes',
   ...*/` was rejected as "truncated" because the closing `*/` ends
   in `/` and tripped the trailing-operator check. looks_complete?
   now strips block comments before inspecting the trailing token.

2. AI assistant generates MySQL backticks on a PG database
   `Suggestion`, `RewriteQuery`, `Optimization`, `IndexAdvisor`,
   `IndexPlanner`, `MigrationRisk`, `SchemaReview`, `WorkloadDigest`,
   and `PatternGrouper` all hardcoded "MySQL" in their system
   prompts; Suggestion also told the model to "use backticks for
   table and column names." On PG this produced unparseable SQL.

   New Ai::DialectHints module exposes name_for(conn) and
   identifier_quoting_rule(conn). Every prompt now interpolates the
   dialect name; Suggestion's quoting rule swaps to double quotes
   on PG. IndexPlanner's prompt also specifies dialect-appropriate
   DROP INDEX syntax.

3. MySQL-only AI buttons visible on PG dashboard
   "Why is it slow?", "Anomaly Detection", "Variable Review",
   "Connection Advisor", "InnoDB Health" all require MySQL-only data
   sources. The buttons were rendered, clicking them returned a
   friendly "MySQL/MariaDB-only" error — but the noise was the user
   experience. capability?(:mysql_only_ai) now returns false on PG,
   and the Server tab gates those five buttons on it. The controller-
   side friendly-error guard stays as a defensive backstop.

4. Misleading server labels on PG
   - "InnoDB Buffer Pool" card → renamed "Buffer Cache" (works for
     both InnoDB and shared_buffers).
   - PG ServerOverview was reporting tmp_disk_pct = 100% whenever
     temp_files > 0, flagging a red badge on every dashboard. PG's
     temp_files is a single count with no in-memory denominator, so
     it now reports 0% and shows N / N for the count.
   - Was mapping xact_rollback to "Aborted Clients" which is a
     different concept; PG's connections block now reports 0 for
     aborted_clients/aborted_connects until a proper PG-side mapping
     exists.

https://claude.ai/code/session_01Xy96nK3Ron2NqukBiSgtzE
@antarr antarr force-pushed the claude/add-postgresql-support-eEEGJ branch 2 times, most recently from 035ee42 to a5512ba Compare May 19, 2026 03:48
@antarr antarr self-assigned this May 19, 2026
@antarr antarr merged commit f3d9e9e into main May 30, 2026
30 checks passed
@antarr antarr deleted the claude/add-postgresql-support-eEEGJ branch May 30, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants