Conversation
…s.ShouldDisplayHelloWorld Agent-Logs-Url: https://github.com/OrchardCMS/OrchardCore/sessions/2785b7b7-3df2-4e12-bbee-ab18230f54ae Co-authored-by: Skrypt <3228637+Skrypt@users.noreply.github.com>
MvcTests.ShouldDisplayHelloWorld by asserting HTTP 200 before content check
… functional tests Agent-Logs-Url: https://github.com/OrchardCMS/OrchardCore/sessions/a4e9f6e8-de1c-4ad7-82e0-ea5a7ea7869a Co-authored-by: Skrypt <3228637+Skrypt@users.noreply.github.com>
|
@copilot This is not an appropriate fix if the issue is a racing condition. |
…tion Agent-Logs-Url: https://github.com/OrchardCMS/OrchardCore/sessions/2385a897-09a0-4e61-8da4-8cc53de4cbb0 Co-authored-by: Skrypt <3228637+Skrypt@users.noreply.github.com>
You're right — the assertions only improve the error message, they don't fix the timing problem. The actual root cause: The real fix (commit |
MvcTests.ShouldDisplayHelloWorld by asserting HTTP 200 before content check|
@copilot Will it not fail running the tests if the server never returns a 200 response? |
Yes, intentionally. After 30 seconds without a non-empty 200 response, |
|
@Piedone While this can make these tests log better information than the fact that they just failed retrieving the content of a page, this doesn't fix the racing condition itself which can't be fixed other than fixing the infrastructure itself. Meaning that if the processing power dips on our CI while we are testing a lot of things all at once then we will experience these issues. Nothing we can do about it other than track/log the periodical failure better. So the functional test are flaky because the CI itself is. It's just a collateral from the CI perf. |
|
You can very well do non-flaky tests even if the infrastructure has ups and downs. E.g. longer timeouts, retried operations (like even clicks), or ultimately retried tests (to a limit) can fix this. |
|
Longer timeouts make the tests take longer. Which I think it already suggested to do. But here instead it relies on getting a 200 response from that page within 30 seconds. This is equivalent than the actual timemout that the Playwright tests are actually using so it may end up failing the tests still. Restart the test doesn't guarantee that it will work either. Making it wait up untill it returns a 200 may also create a never ending loop. So, if these tests don't pass within the 30 second range they are set to use by default then to me it makes sense that it fails and that the user restart the test manually to prevent having never-ending tests. A retry up to 3 times could work but still could fail while we could simply wait up untill there's a lighter load on the CI instead of adding more. |
|
Counting on manual reruns are not good, because they reduce trust. "Ah, it's failing, but it's randomly failing anyway, let's start with a rerun before thinking of anything." So, that's more of a last resort, and generally, CI should be reliable. The techniques I mention are widely used during UI testing and proven to work. For each of these, there should be a limit, of course, not retrying endlessly (since a test failing should normally mean an actual failure in the app, not flakyness). Tests should never assume that something will happen with a given duration, nor should they always wait and hope something will finish. Rather, with the timeout I mean that that's the time it should wait for some condition (I guess the first page load is the issue now) at most, but not necessarily. All in all, the point is that the tests should pass if the app's logic is correct, regardless of how long page loads, clicks, or anything takes, within a set range of tolerance. |
|
Yeah, I will do the fix but at the same time it still somehow just stretching the limits of what Playwright does by default up untill it is no longer enough. 😄 |
|
@copilot poll a health endpoint in a loop with a timeout instead of using Thread.Sleep with a fixed duration. something like polling HttpClient.GetAsync against a app's health route until it returns 200, with a configurable max wait. |
The current private static async Task WarmUpAsync(string baseAddress, int timeoutSeconds = 30)
{
using var client = new HttpClient();
var deadline = DateTime.UtcNow.AddSeconds(timeoutSeconds);
while (DateTime.UtcNow < deadline)
{
try
{
var response = await client.GetAsync($"{baseAddress}/");
if (response.IsSuccessStatusCode)
{
var content = await response.Content.ReadAsStringAsync();
if (!string.IsNullOrWhiteSpace(content))
{
return;
}
}
}
catch
{
// Server not yet accepting connections — keep waiting.
}
await Task.Delay(500);
}
throw new TimeoutException(
$"The MVC application at '{baseAddress}' did not serve content within {timeoutSeconds} seconds.");
} |
…c instead of polling root URL Agent-Logs-Url: https://github.com/OrchardCMS/OrchardCore/sessions/f6154b7a-8517-4acb-aa41-c40f7d7e4f84 Co-authored-by: Skrypt <3228637+Skrypt@users.noreply.github.com>
…repeated navigation+assertion pattern Agent-Logs-Url: https://github.com/OrchardCMS/OrchardCore/sessions/30dd01dd-c516-4f44-9400-8015c863f5fd Co-authored-by: Piedone <1976647+Piedone@users.noreply.github.com>
Reduces flakiness in the MVC and CMS functional (Playwright) tests by ensuring the OrchardCore tenant pipeline is fully initialized before tests begin, and by improving the shared navigation assertion pattern.
Fixes #19120.
Changes
OrchardCore.HealthChecksproject reference toOrchardCore.Mvc.Web.csprojso the module is discoverable by the MVC test host.OrchardCore.HealthChecksfeature inStartMvcAsyncvia.AddTenantFeatures("OrchardCore.HealthChecks").WarmUpAsyncto poll the/health/liveendpoint (registered by the HealthChecks module inside the tenant pipeline) instead of/, and simplified the check toIsSuccessStatusCode— the health endpoint only becomes reachable onceBuildPipelineAsynchas completed, making it a reliable readiness probe.GotoAsync→Assert.NotNull→Assert.True(response.Ok, ...)) into aGotoAndAssertOkAsync(this IPage page, string url)extension method in a newNavigationHelperclass, replacing all 13 occurrences across the test suite.