Skip to content

A4IHackathon - Guardians Of Data : Security hardening, column masking & injection detection, REST/Swagger integration, validated bug fixes, and full docs/onboarding#16

Open
nakultripathi-lab wants to merge 25 commits into
ThalesGroup:mainfrom
captainp99:main

Conversation

@nakultripathi-lab

Copy link
Copy Markdown

Security hardening, column masking & injection detection, REST/Swagger integration, validated bug fixes, and full docs/onboarding

Summary

This MR delivers a major round of work on sql-data-guard: five new security
capabilities, a backlog of validated security/config bug fixes, a REST API with
Swagger, an extensive unit-test suite, and complete technical + onboarding
documentation.

Net change: +5,644 / −172 across 45 files (24 commits).

New security features

  • Feature 1 — Malicious-payload / SQL-injection detection (src/sql_data_guard/injection_detection.py):
    pre-parse raw-string scan (catches --, /* */, # comment-evasion) plus an
    AST scan for stacked statements, dangerous functions, and system-catalog
    probing, with intent-revealing messages and risk weights. Makes good on the
    README's long-standing "built-in detection module" claim that previously had
    no implementation.
  • Feature 2 — Column-level masking (src/sql_data_guard/column_masking.py): per-table
    column_masks rewrite sensitive columns instead of dropping them — redact,
    hash (MD5), and partial (show last N) policies, preserving result shape.
  • Feature 3 — Function allow-list / deny-list (_verify_functions): config-driven
    function policy on top of F1's always-on deny-list, so deployments can
    permit/forbid specific functions (e.g. block LOAD_FILE).
  • Feature 4 — Mandatory LIMIT / max-rows enforcement (_enforce_force_limit):
    injects or clamps LIMIT on the outermost query via the existing auto-fix
    path, closing the full-table-exfiltration gap.
  • Feature 5 — Deny-list columns & deny-aware wildcards: per-table denied_columns
    with deny-wins precedence and SELECT * / table.* expansion that never
    leaks a denied column.

Validated bug fixes (regression-tested)

Security

  • S1 — escape generated restriction values via sqlglot literals; quote IN members.
  • S2 — numeric (not string) comparison for < > <= >=; inject the correct operator.
  • S3 — close column allow-list bypass when all FROM tables are dynamic.
  • S4 — verify EXCEPT/INTERSECT (dispatch on SetOperation), not just UNION.
  • S7 — explicitly reject stacked / multi-statement SQL.

Config validation (no more caller-facing HTTP 500s)

  • C1IN accepts a non-empty list of consistently-typed values.
  • C2 — require a value for =, >, <, <=, >=.
  • C3 — case-insensitive operations.
  • C4verify_sql returns a result dict on invalid config (also catches ValueError).

Reliability

  • Q1 — errors returned as an ordered, de-duplicated list.

REST API & integration

  • Swagger/OpenAPI integration and REST endpoint additions in
    src/sql_data_guard/rest/sql_data_guard_rest.py.
  • MCP wrapper updates (src/sql_data_guard/mcpwrapper/mcp_wrapper.py).
  • Dockerfile and dependency updates (pyproject.toml, requirements.txt).

Tests

New unit suites covering:

  • test/test_injection_detection_unit.py
  • test/test_column_masking_unit.py
  • test/test_denied_columns_unit.py
  • test/test_function_policy_unit.py
  • test/test_limit_enforcement_unit.py
  • test/test_rest_api_unit.py
  • test/test_security_fixes_unit.py (validated security-fix backlog, ~395 lines)

Documentation & onboarding

  • Full technical docs set (docs/technical-documentation/): index, code
    explanation, HLD, LLD, architecture, coding guidelines.
  • New ONBOARDING.md and a comprehensive README.md rewrite.
  • AI-assisted dev tooling: .agents/skills, .clinerules/Security-Rule.md, and
    a technical-documentation workflow.

Paras Jain and others added 25 commits June 24, 2026 14:58
Implements the validated bug backlog (from REPLICATION_REPORT.md /
REPRODUCTION_STEPS.md), each covered by regression tests in
test/test_security_fixes_unit.py.

Security:
- S1 escape generated restriction values via sqlglot literals; quote IN members
- S2 numeric (not string) comparison for < > <= >=; inject the correct operator
- S3 close column allow-list bypass when all FROM tables are dynamic
- S4 verify EXCEPT/INTERSECT (dispatch on SetOperation), not just UNION
- S7 explicitly reject stacked / multi-statement SQL

Config validation (no more caller-facing crashes / HTTP 500):
- C1 IN accepts a non-empty list of consistently-typed values
- C2 require a value for =, >, <, <=, >=
- C3 case-insensitive operations
- C4 verify_sql returns a result dict on invalid config (also catches ValueError)

Reliability / build / opt-in hardening:
- Q1 errors are an ordered, de-duplicated list
- Q2 remove dead double find(Where)
- Q3 make the MCP wrapper import-safe
- Q4 valid PEP 440 dev version
- S5 opt-in max_risk hard-block; S6 opt-in X-API-Key auth

Full unit suite: 326 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add three config-driven SQL safety features:
- F1: allowed_functions / blocked_functions enforcement
- F2: force_limit to inject/clamp a mandatory LIMIT
- F3: per-table denied_columns with deny-aware * expansion

Includes the injection-detection module, unit tests for each
feature, per-feature docs, and a REST port/debug update.
Integrate new feature work from main (injection_detection module, function
allow/deny-list, mandatory LIMIT, denied columns) with the validated bug fixes
on this branch.

Conflict resolution:
- restriction_validation.py: keep the fixed per-restriction validation (C1-C4)
  and add the new force_limit / function-list / denied_columns config checks.
- sql_data_guard.py: keep parsed=None + pre-parse scan_raw_sql; keep the
  None-safe fixed assignment alongside _enforce_force_limit; unify the
  stacked-query message with injection_detection's intent-revealing text.
- Adapt the new tests' `errors == set()` assertions to the ordered-list errors (Q1).

Full unit suite: 378 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Integrate column-level masking (and manual/doc updates) from main with the
validated bug fixes on this branch.

Conflict resolution (sql_data_guard.py): call both validate_restrictions and
validate_column_masks, and keep catching ValueError (C4) so an invalid config
returns a result dict instead of crashing. Masking, injection detection,
function policy, force_limit, and the S6 API key all coexist.

Full unit suite: 395 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ig-bugs

Fix validated security and config-validation bugs
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ions

- Add §4 intake questions (all optional) mapped to quadrants, with
  use-answer-else-derive-from-repo precedence
- Reframe §5 as the repo-derived fallback; generalize repo signals
  beyond Python (package.json, pom.xml, go.mod, .agents/.claude)
- Add §8 end-to-end flow; works with zero, partial, or full answers

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Bundle template.pptx (official A4I 4-quadrant deck) with the skill
- Rework Method B to clone the template and swap only bullet text,
  preserving exact design (cards, hero border, fonts, colours)
- Add fill_template.py helper + sample.html / sample.png / one-pager.pptx
  as worked examples

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generates the A4I Hackathon submission write-up following the official
template format exactly (sections 1-8). Enforces no-hallucination rules:
every claim sourced from repo or user, unknowns left as explicit
NEEDS INPUT placeholders. Written for a management/reviewer audience.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the README with the production-grade rewrite (overview,
architecture diagram, quick start, REST/MCP/Dify integration, project
structure, testing, security, FAQ). Verified README.md had no team
edits today before replacing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents the features the team shipped today, each validated against
the code and the passing test suite (395 passed):
- Column masking (redact/hash/partial), denied_columns
- Injection detection (stacked/dangerous-func/system-catalog, opt-in comments)
- Function allow/block lists, force_limit row cap, max_risk hard-block
- New 'Policy and security controls' config reference section
- AI-assisted development section (.agents/skills, .clinerules)
- Refresh architecture diagram, how-it-works, project structure, testing (13 suites)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants