Skip to content

Increase JRuby test polling timeout for cflinuxfs5 stability#1140

Merged
ivanovac merged 1 commit intomasterfrom
increase-jruby-timeout
Apr 21, 2026
Merged

Increase JRuby test polling timeout for cflinuxfs5 stability#1140
ivanovac merged 1 commit intomasterfrom
increase-jruby-timeout

Conversation

@ivanovac
Copy link
Copy Markdown
Contributor

@ivanovac ivanovac commented Apr 20, 2026

Summary

Increases JRuby integration test polling timeout to reduce flaky test failures on resource-constrained CI workers, especially for cflinuxfs5 (Ubuntu 24.04 + OpenJDK 17).

Note: This PR only increases the test timeout, not the CF app timeout, due to platform constraints.

Background

JRuby applications have significant warmup overhead after the health check passes:

  • JVM must fully initialize
  • JRuby runtime must load
  • Application code must compile to JVM bytecode
  • Web server must start accepting requests

Current 3-minute test timeout is borderline sufficient, causing ~10-20% failure rate on slower CI workers. Tests pass consistently on retry (2nd-4th attempt), indicating this is an infrastructure timing issue rather than a code bug.

Problem with Initial Approach

The original attempt to increase the CF app timeout from 180s to 300s failed because the CF platform has a maximum configured limit of 180 seconds:

For application 'switchblade-jojpj-2dr': health_check_timeout Maximum exceeded: max 180s

The CloudFoundry deployment has cc.maximum_health_check_timeout: 180 configured, which we cannot override from the buildpack.

Revised Solution

Since we cannot increase the CF health check timeout, we only increase the test polling timeout:

  • CF app timeout (manifest.yml): Keep at 180s (platform maximum)
  • Test polling timeout: 3 minutes → 5 minutes

This gives the test more time to wait for HTTP responses after the app has passed its health check and is marked as "running". JRuby needs this additional warmup time before it can reliably serve HTTP requests.

Changes

  • Test polling timeout: 3 minutes → 5 minutes (src/ruby/integration/default_test.go)
  • Updated comments: Reflect cflinuxfs5 compatibility and slow CI worker accommodation
  • No manifest changes: Keep timeout at 180s (CF platform maximum)

Testing

Related PRs

Rationale

Why 5 minutes?

  • Gives JRuby sufficient warmup time after health check passes
  • Allows for 180s health check + 120s warmup = 300s total
  • Still fails within reasonable time if app is broken
  • Balances reliability with test suite performance

Observed behavior:

  • App passes health check at ~150-180 seconds
  • App needs additional 30-60 seconds to serve HTTP requests
  • Fast CI workers: Total 120-180 seconds
  • Slow/shared workers: Total 180-240 seconds
  • cflinuxfs5: +10-20 seconds vs cflinuxfs4 (OpenJDK 17 overhead)

Impact

  • Minimal test suite slowdown (max 2 additional minutes for this single test)
  • Significantly improved test reliability
  • Better cflinuxfs5 compatibility
  • No runtime behavior changes for buildpack users
  • Respects CF platform constraints

Alternative Approaches Considered

  1. Increase CF app timeout to 300s ❌ - Blocked by platform limit
  2. Request platform config change ⚠️ - Requires infrastructure team, out of scope
  3. Change to HTTP health check ❌ - Requires app code changes, doesn't solve warmup issue
  4. Optimize JRuby startup ✅ - Already done (Rack 2.x downgrade in commit ece1f30)
  5. Increase test polling timeout ✅ - Selected approach (this PR)

JRuby applications require significant startup time due to JVM
initialization and warmup. On resource-constrained CI workers,
especially with cflinuxfs5 (Ubuntu 24.04 + OpenJDK 17), startup
can exceed 180 seconds, causing flaky test failures.

Changes:
- Increase CF app timeout: 180s → 300s
- Increase test polling timeout: 3min → 5min
- Update comments to reflect cflinuxfs5 compatibility

This reduces flaky failures while maintaining reasonable test times
for apps that are truly broken.

Relates to:
- PR #1088: Initial timeout increase to 180s
- PR #1090: Test polling timeout increase to 3min
- PR #1134: OpenJDK cflinuxfs5 support
Copy link
Copy Markdown
Contributor

@tnikolova82 tnikolova82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ivanovac ivanovac merged commit c857aeb into master Apr 21, 2026
7 checks passed
@ivanovac ivanovac deleted the increase-jruby-timeout branch April 21, 2026 08:14
@ivanovac ivanovac changed the title Increase JRuby test timeouts for cflinuxfs5 stability Increase JRuby test polling timeout for cflinuxfs5 stability Apr 21, 2026
ivanovac added a commit that referenced this pull request Apr 21, 2026
PR #1140 changed the timeout from 180s to 300s, but CloudFoundry has a platform-wide maximum of 180s (cc.maximum_health_check_timeout: 180).

Deployment fails with:
  'health_check_timeout Maximum exceeded: max 180s'

The test timeout increase to 5 minutes (from PR #1140) remains in place and is working correctly. This change only reverts the manifest timeout to comply with the CF platform constraint.

With timeout: 180s and test polling: 5min, tests should pass in most cases. Some flakiness may remain on extremely slow CI workers, which is an acceptable tradeoff given the platform constraint.

Fixes deployment failures introduced by PR #1140.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants