Skip to content

feat(observability): security operations metrics (dwell time and coverage) #485

Description

@krokoko

Context: ROADMAP.md → Security operations metrics (dwell time and coverage)


Component

CDK / infrastructure

Describe the feature

CloudWatch metrics and dashboard panels: time from anomaly (circuit breaker trip, guardrail spike, policy deny burst) to operator awareness; fraction of security/ops alarms investigated. Targets shortened exploit windows.

Use case

Security teams measure detection-to-response, not just alert firing. Uninvestigated alarms indicate process gaps.

Proposed solution

  1. Metric: security_anomaly_to_ack_seconds (anomaly event → first operator action or ticket).
  2. Metric: security_alarms_investigated_ratio (manual tag or integration hook).
  3. Dashboard row on operator dashboard.
  4. Optional integration with PagerDuty/Opsgenie ack timestamps.

Other information

  • Pairs with behavioral circuit breaker and automated alert triage drafts.

  • Design context: docs/design/OBSERVABILITY.md, docs/design/SECURITY.md.

  • This might be a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestobservabilityTracing, attribution, dashboards, metrics, alarms, telemetry redactionsecurityCedar/HITL, IAM least-privilege, secrets, PII/DLP, guardrails, supply-chain/CVE

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions