Skip to content

[Feature]: UAT assert-recipe.yaml tests are too weak — assert expected component names, not just count > 0 #498

@ayuskauskas

Description

@ayuskauskas

Prerequisites

  • I searched existing issues

Feature Summary

Problem

The UAT assert-recipe.yaml files across all CUJ tests only check that the recipe has a non-zero number of components:

(length(componentRefs) > `0`): true

This is a trivially passing assertion — a recipe with a single wrong component would pass. These tests give a false sense of coverage: they verify
the recipe exists but not that it contains the right components.

Affected files (all 4 are identical in weakness):

  • tests/uat/aws/tests/cuj1-training/assert-recipe.yaml
  • tests/uat/aws/tests/cuj2-inference/assert-recipe.yaml
  • tests/uat/azure/tests/cuj1-training/assert-recipe.yaml
  • tests/uat/azure/tests/cuj2-inference/assert-recipe.yaml

Step 3 ("Validate deployment against live snapshot") doesn't validate much

The validate-deployment step runs aicr validate but:

  1. Uses || true to swallow non-zero exit codes — validation failures are silently ignored
  2. The assertion only checks that the output file exists and contains reportFormat: CTRF
  3. No assertion on pass/fail counts, individual test names, or that any tests actually passed

The assert-validate-multiphase.yaml has the same issue — it only asserts summary.tests > 0, not that any tests passed:

results:
  summary:
    tests: (@ > 0)

Problem/Use Case

This pattern is being copy-pasted to new cloud/accelerator variants. The Azure AKS tests (#476) were copied directly from the AWS tests with the same weak assertions. As we add H200 and other variants, every new UAT suite will inherit these gaps and false assurances unless we fix the template now.

Proposed Solution

Proposed fix

1. Assert expected component names in assert-recipe.yaml

Each CUJ should assert the specific component names expected for that recipe configuration. For example, the EKS/H100/training/kubeflow recipe
assertion should look something like:

kind: RecipeResult
apiVersion: aicr.nvidia.com/v1alpha1
criteria:
  service: eks
  accelerator: h100
  intent: training
  os: ubuntu
  platform: kubeflow
# Assert expected components are present by name
(componentRefs[?name == 'gpu-operator']): (length(@) > `0`)
(componentRefs[?name == 'network-operator']): (length(@) > `0`)
# ... all expected components for this recipe

The exact component list should be derived from what aicr recipe actually produces for each CUJ configuration.

2. Strengthen validation assertions

  • Remove || true from validate steps, or at minimum assert on the exit code
  • Assert summary.passed > 0 (not just summary.tests > 0) so all-fail runs don't silently pass
  • Consider asserting individual check names exist in the results, similar to the component name approach

Success Criteria

  • CUJ asserts check the actual expected content for specific, important, recipes
  • Validation assertions check that the tests pass

Alternatives Considered

No response

Component

Multiple components

Priority

Important (would improve my workflow)

Compatibility / Breaking Changes

No response

Operational Considerations

No response

Are you willing to contribute?

Yes, I can open a PR

Metadata

Metadata

Assignees

Labels

P2Minor defects; minor implications (no SLA commitment)featureFeature request

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions