Skip to content

test(bats): Fix flaky connect helper methods#6535

Draft
moduli wants to merge 2 commits intomainfrom
moduli-bats-flaky-connect
Draft

test(bats): Fix flaky connect helper methods#6535
moduli wants to merge 2 commits intomainfrom
moduli-bats-flaky-connect

Conversation

@moduli
Copy link
Copy Markdown
Collaborator

@moduli moduli commented Mar 24, 2026

Description

This PR attempts to address a flaky bats test after several PRs that added additional debug

boundary/session/connect: unpriv user can connect to default target203/319 ✗ boundary/session/connect: unpriv user can connect to default target
   (in test file boundary/sessions.bats, line 39)
     `[ "$status" -eq 0 ]' failed
   
   Authentication information:
     Account ID:      acctpw_0987654321
     Auth Method ID:  ampw_1234567890
     Expiration Time: Mon, 30 Mar 2026 19:43:31 UTC
     User ID:         u_0987654321
   
   The token name "default" was successfully stored in the chosen keyring and is not displayed here.
   connecting to :
   Proxy listening information:
     Address:             127.0.0.1
     Connection Limit:    -1
     Expiration:          Tue, 24 Mar 2026 03:43:32 UTC
     Port:                35665
     Protocol:            tcp
     Session ID:          s_i5yX7p15Nd
   SSH-2.0-OpenSSH_9.6p1 Ubuntu-3ubuntu13.14
   Invalid SSH identification string.
   status: 0
   connecting to :
   Proxy listening information:
     Address:             127.0.0.1
     Connection Limit:    -1
     Expiration:          Tue, 24 Mar 2026 03:43:34 UTC
     Port:                38383
     Protocol:            tcp
     Session ID:          s_gEldze11Kg
   status: 255

The interesting bit is

   SSH-2.0-OpenSSH_9.6p1 Ubuntu-3ubuntu13.14
   Invalid SSH identification string.

It seems like, on occasion, the existing connect command would send some string to the server that it doesn't expect. The current theory is that it's some race condition between the nc command connecting to the target sending "foo" to stdin and the target server sending the SSH banner and waiting for input. It seems like, most of the time, nc likely sends "foo" before the SSH banner is sent. This only occurs in Github runners as the default target we're connecting to has an address of 127.0.01:22, and runners likely have an ssh service listening on that port.

This PR modifies the connect command in a few ways

  • echoes "SSH-2.0-Test" instead, which is a valid response to the SSH banner
  • adds -w 5 to allow enough time for the boundary proxy to establish.

https://hashicorp.atlassian.net/browse/ICU-18860

Testing

boundary/login: can login as admin user
 ✗ boundary/target/connect: admin user can connect to default target
   (in test file boundary/target.bats, line 19)
     `[ "$status" -eq 0 ]' failed
   connecting:
   Proxy listening information:
     Address:             127.0.0.1
     Connection Limit:    -1
     Expiration:          Tue, 07 Apr 2026 05:44:04 EDT
     Port:                51291
     Protocol:            tcp
     Session ID:          s_Vsh3ElhZnc

   ...  
   nc: connectx to 127.0.0.1 port 1234 (tcp) failed: Connection refused
   status: 1

PCI review checklist

  • I have documented a clear reason for, and description of, the change I am making.
  • If applicable, I've documented a plan to revert these changes if they require more than reverting the pull request.
  • If applicable, I've documented the impact of any changes to security controls.
    Examples of changes to security controls include using new access control methods, adding or removing logging pipelines, etc.

@github-actions github-actions bot added the core label Mar 24, 2026
@moduli moduli added the pr/no-milestone Ignores the Milestone Check label Mar 24, 2026
@moduli moduli force-pushed the moduli-bats-flaky-connect branch from d468780 to 445b9a0 Compare March 24, 2026 14:50
@moduli moduli requested a review from Copilot March 24, 2026 17:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CLI bats test connect helpers to reduce flakiness when validating boundary connect by avoiding sending unexpected data to SSH targets.

Changes:

  • Stop piping "foo" into boundary connect / nc during connect tests.
  • Switch nc invocation to a port-open check using -z and a connect timeout via -w 3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@moduli moduli force-pushed the moduli-bats-flaky-connect branch 2 times, most recently from 1b18ed6 to 80afc1d Compare March 31, 2026 17:56
@moduli moduli force-pushed the moduli-bats-flaky-connect branch 4 times, most recently from 8a625d8 to 2d79c72 Compare April 7, 2026 17:27
@moduli moduli force-pushed the moduli-bats-flaky-connect branch from 2d79c72 to 4ecd438 Compare April 7, 2026 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core pr/no-milestone Ignores the Milestone Check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants