Skip to content

Inventory gathering: discover and catalog occurrences across entities #6105

@JAORMX

Description

@JAORMX

Please describe the enhancement

Minder excels at policy enforcement (pass/fail evaluation) across repos, artifacts, and PRs. However, it currently discards the rich data it ingests after each evaluation cycle -- only the pass/fail status is persisted. Users need a complementary capability: discovering and cataloging what exists across their repos without necessarily making a judgment about it.

Use cases this would unlock:

Use Case Question Answered
Vulnerability response "Which repos are affected by CVE-2024-XXXX?" (cross-ref dep inventory with OSV)
Configuration drift "Which repos diverge from our blessed Dockerfile base images?"
Compliance reporting "What licenses exist across all our dependencies?"
Supply chain mapping "What Actions, tools, and deps does each repo use?"
Onboarding/discovery "What do we have?" -- a new engineer's first question
Shadow dependency detection "Are any repos pulling from unauthorized registries?"

Today Minder can answer these per-repo with individual rule evaluations, but can't aggregate across repos or persist the collected data for querying.

The core gap: The evaluation pipeline produces structured data (via EvaluationResult.Output) but throws it away. Only status + error text reaches the database. The building blocks for collection exist (ingesters, Rego extraction, event pipeline) -- what's missing is persistence and querying of structured results.

Solution Proposal

Extend Profiles with Inventory Mode

Profiles gain an optional mode field. When set to inventory, the profile produces occurrences (structured findings) instead of pass/fail evaluations. Everything else -- rule types, ingesters, Rego extraction, scheduling, authorization, selectors -- works the same.

An occurrence is the atomic unit of inventory -- a specific finding in a specific entity:

  • Where: which entity (repo, artifact) it was found in
  • What: a normalized key (actions/checkout@v4) + structured metadata (file path, version, etc.)
  • When: first_seen / last_seen timestamps
  • How: which rule type + ingester collected it

What gets reused

Component Reused As-Is Notes
Ingesters (REST, Git, Deps, Artifact) Yes Already pull Actions workflows, SBOMs, file contents
Rego evaluator Extended New type: inventory mode that emits items instead of violations
Profile system Extended New mode: inventory field
Event pipeline Yes Entity events trigger inventory collection automatically
Scheduling (reminder) Yes Periodic re-collection on existing cadence
Authorization (OpenFGA) Yes Inventory scoped to projects like everything else
Selectors Yes Can target inventory collection to specific entities

User experience

Setup:

# org-inventory.yaml
version: v1
type: profile
name: org-inventory
mode: inventory
repository:
  - type: github-actions-inventory
  - type: dependency-inventory
  - type: base-image-inventory
minder profile create -f org-inventory.yaml
# Minder starts collecting on the next evaluation cycle

Querying:

# What GitHub Actions are used across my project?
minder inventory list --type github_action
# ENTITY              ACTION                  VERSION   FILE
# acme/frontend       actions/checkout        v4        .github/workflows/ci.yml
# acme/frontend       actions/setup-node      v4        .github/workflows/ci.yml
# acme/backend        actions/checkout        v4        .github/workflows/test.yml

# Which repos use a specific action?
minder inventory list --type github_action --key "actions/checkout*"

# Aggregated summary
minder inventory summary --type github_action
# ACTION                  REPOS   OCCURRENCES
# actions/checkout        47      89
# actions/setup-node      23      45

Data model

CREATE TABLE inventory_items (
    id                 UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id         UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    entity_instance_id UUID NOT NULL REFERENCES entity_instances(id) ON DELETE CASCADE,
    rule_type_id       UUID NOT NULL,
    inventory_type     TEXT NOT NULL,   -- "github_action", "dependency", "base_image"
    item_key           TEXT NOT NULL,   -- normalized: "actions/checkout@v4"
    item_data          JSONB NOT NULL,  -- structured details
    first_seen         TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    last_seen          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE(entity_instance_id, rule_type_id, item_key)
);

Change tracking uses snapshot-based approach (first_seen/last_seen timestamps). Event-based change log deferred to future iteration.

Example inventory rule type

version: v1
type: rule-type
name: github-actions-inventory
context:
  provider: github
def:
  in_entity: repository
  ingest:
    type: git
    git: {}
  eval:
    type: rego
    rego:
      type: inventory   # new: emits items, no pass/fail
      def: |
        package minder
        import rego.v1

        inventory_items contains item if {
          file := input.fs[path]
          startswith(path, ".github/workflows/")
          endswith(path, ".yml")
          workflow := yaml.unmarshal(file)
          job := workflow.jobs[job_name]
          step := job.steps[_]
          step.uses
          item := {
            "type": "github_action",
            "key": step.uses,
            "data": {
              "workflow_file": path,
              "job_name": job_name,
              "step_name": object.get(step, "name", "unnamed"),
            }
          }
        }

gRPC API

service InventoryService {
  rpc ListInventoryItems(ListInventoryItemsRequest) returns (ListInventoryItemsResponse);
  rpc GetInventorySummary(GetInventorySummaryRequest) returns (GetInventorySummaryResponse);
}

What makes this different from existing tools?

Tool What It Does What Minder Adds
GitHub Dependency Graph Shows deps per-repo Cross-repo aggregation, custom inventory types
Dependabot Updates deps per-repo Extensible to non-dep inventory (Actions, configs, images)
SBOM tools (Syft, etc.) Generate SBOMs per-artifact Centralized storage, cross-entity queries, change tracking
CSPM asset inventory Cloud resource inventory Code-level inventory (deps, configs, Actions)

Minder's differentiators: extensible (users define inventory types via rule types), cross-provider (GitHub + GitLab + OCI), self-hosted.

Describe alternatives you've considered

  1. Inventory as always-passing evaluation rules -- Rule types that always "pass" but emit structured output. Reuses everything but the pass/fail framing is conceptually awkward for pure data collection, and users are forced to create enforcement-style profiles.

  2. Dedicated inventory subsystem -- Separate "inventory collectors" at project level with their own pipeline, storage, and APIs. Cleanest semantics but significantly more code and a new concept for users to learn. Could be a future evolution if inventory profiles prove too limited.

  3. Extend entity properties -- Store inventory as JSONB arrays in the existing properties table. No new tables, but the properties system is designed for simple key-value metadata, not structured lists of hundreds of items. Cross-entity aggregation would be expensive.

Additional context

Scale considerations

500 repos x (50 Actions + 200 deps + 5 base images) = ~127K items. Manageable for PostgreSQL with proper indexing. Collection frequency should be configurable per inventory profile.

Implementation roadmap

Phase 1: Foundation -- mode field on profiles, inventory Rego evaluation type, inventory_items table, InventoryService gRPC RPCs

Phase 2: Rule types + CLI -- Ship github-actions-inventory, dependency-inventory, and base-image-inventory rule types. Add minder inventory list/summary/types CLI commands.

Phase 3: Polish -- Selector support for inventory profiles, dashboard/UI integration, documentation.

Open design questions for future iterations

  1. Should inventory profiles auto-apply to all entities in a project?
  2. Show inventory in minder repo get alongside properties?
  3. Configurable TTL for stale items?
  4. Could enforcement rules query inventory data in a future iteration?
  5. Event-based change tracking (inventory_changes table) for add/remove/modify history?

Acceptance Criteria

  • Profiles support a mode: inventory field that produces occurrences instead of pass/fail evaluations
  • New inventory Rego evaluation type that emits structured items
  • inventory_items database table with proper indexing for cross-entity queries
  • InventoryService gRPC RPCs: ListInventoryItems and GetInventorySummary
  • At least 3 built-in inventory rule types shipped: GitHub Actions, dependencies (SBOM), container base images
  • CLI commands: minder inventory list, minder inventory summary, minder inventory types
  • Inventory collection triggered automatically by entity events and reminder scheduling
  • Snapshot-based change tracking via first_seen/last_seen timestamps

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions