Skip to content

feat(test): Add integration tests for status server#3373

Open
digvijay-y wants to merge 6 commits intokubeflow:masterfrom
digvijay-y:test-Server
Open

feat(test): Add integration tests for status server#3373
digvijay-y wants to merge 6 commits intokubeflow:masterfrom
digvijay-y:test-Server

Conversation

@digvijay-y
Copy link
Copy Markdown

What this PR does / why we need it:
Adds integration tests for the status server handleTrainJobRuntimeStatus handler.

The tests include coverage of:

  • The happy path (updating the train job status)
  • invalid data is rejected with a useful error message progressPercentage greater than 100 +kubebuilder:validation:Minimum=0 and +kubebuilder:validation:Maximum=100 with this handler would return 422
  • updating a non-existing trainjob is rejected with a useful error message

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #3346

Checklist:

  • Docs included if any changes are user facing

Copilot AI review requested due to automatic review settings March 22, 2026 14:31
@github-actions
Copy link
Copy Markdown

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
  • Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

@digvijay-y digvijay-y changed the title feat(Tests): Add integration tests for status server feat(test): Add integration tests for status server Mar 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Ginkgo integration coverage for the status server’s handleTrainJobRuntimeStatus endpoint, exercising real API-server validation behavior in the envtest-based integration suite.

Changes:

  • Introduces a new integration spec that boots a TLS-enabled status server and posts status updates against it.
  • Covers success, invalid payload validation (e.g., progressPercentage > 100), and not-found TrainJob updates.

Copy link
Copy Markdown
Contributor

@robert-bell robert-bell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @digvijay-y, thanks for your patience and the reminder.

It's looking good so far, but I have a few comments. We should look at the copilot suggestions too.

There's a couple of extra scenarios we could check - would you be happy to take a look?

  • testing that an update overwrites an existing trainer status. It shouldn't merge the status (e.g. if a progress percent is there initially, but not in the update, then it should be removed by the update).
  • testing that an empty request UpdateTrainJobStatusRequest{} is valid but doesn't overwrite an existing trainer status.

I think the current implementation should already handle both those cases correctly.

return true, nil
}

func generateTestTLSConfig() (*tls.Config, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a chunky utility - can we use crypto/tls/testcert instead?

Maybe something like

 func generateTestTLSConfig() (*tls.Config, error) {
      cert, err := tls.X509KeyPair(testcert.LocalhostCert, testcert.LocalhostKey)
      if err != nil {
          return nil, err
      }
      return &tls.Config{Certificates: []tls.Certificate{cert}}, nil
  }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crypto/tls/testcert is an internal stdlib package and can't be imported outside the standard library. Used runtime self-signed cert generation via crypto/ecdsa and crypto/x509 instead.

gomega.Expect(k8sClient.Create(ctx, trainJob)).To(gomega.Succeed())

ginkgo.By("POSTing a valid status update")
resp := postStatus(httpClient, serverAddr, ns.Name, jobName, trainer.UpdateTrainJobStatusRequest{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for an empty update request? The TrainJob status should be unchanged; any existing status should not be removed.

trainer.UpdateTrainJobStatusRequest{}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a dedicated It block , "Should accept an empty update request but not overwrite existing status". It POSTs an initial status, then POSTs UpdateTrainJobStatusRequest{}, and uses Consistently to verify the existing TrainerStatus is unchanged

var status metav1.Status
gomega.Expect(json.NewDecoder(resp.Body).Decode(&status)).To(gomega.Succeed())
gomega.Expect(status.Message).To(gomega.ContainSubstring("progressPercentage"),
"error message should identify the invalid field")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check the train job status isn't updated?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Consistently block after the 422 assertions that polls the TrainJob for 2 seconds and asserts TrainerStatus remains nil, confirming the rejected request had no side-effects.

@digvijay-y digvijay-y requested a review from robert-bell April 3, 2026 18:34
@digvijay-y
Copy link
Copy Markdown
Author

@robert-bell I've Implemented the changes, would you take a look, whenever possible.

Copy link
Copy Markdown
Contributor

@robert-bell robert-bell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @digvijay-y. Just one small extra comment, but otherwise this is looking good. It'll need a review from a maintainer before it can be merged. @astefanutti @andreyvelich please could you take a look when you have bandwidth?

Comment on lines +435 to +438
err := k8sClient.Get(ctx, types.NamespacedName{
Name: jobName,
Namespace: ns.Name,
}, notFound)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be wrapped in ginkgo.Consistently too?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped it up. Thank you for the review.

Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>
Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>
Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>
Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/L and removed size/XS labels Apr 11, 2026
Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>
Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>

signed-off-by: DIGVIJAY <yewaredigvijay@gmail.com>
Signed-off-by: DIGVIJAY <144053736+digvijay-y@users.noreply.github.com>
@digvijay-y
Copy link
Copy Markdown
Author

@astefanutti @andreyvelich I have implemented changes suggested by @robert-bell , Could please take a look whenever possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add integration tests for status server

3 participants