Skip to content

Feature/server outage screen#3199

Open
MaartenD wants to merge 11 commits into
lichess-org:mainfrom
MaartenD:feature/server-outage-screen
Open

Feature/server outage screen#3199
MaartenD wants to merge 11 commits into
lichess-org:mainfrom
MaartenD:feature/server-outage-screen

Conversation

@MaartenD

@MaartenD MaartenD commented May 16, 2026

Copy link
Copy Markdown
Contributor

UPDATE: How Outage screen is now implemented June 12th 20206

Summary: Outage screen (server status feature)

Goal: Distinguish between "no internet connection" (the phone has no wifi/data) and "the Lichess server is unreachable" (the phone has internet, but lichess.org is down or in maintenance), and show an appropriate message for each.

Background

While working on this feature, I asked in the Lichess-Development Discord channel how the web client detects a server-side outage versus a regular connectivity issue. revoof explained:

"It's a server-side decision. It only works when a frontend server is still up, but the backend isn't. Then the frontend server serves the error page — using HTTP 503 for planned maintenance, and HTTP 502 if the backend is unreachable. So if the mobile app sees one of those response codes on API calls or websocket connections, it can assume it's an issue on Lichess' side, not just a normal connection issue."

This confirmed that checking for HTTP 502/503 responses is the right, server-endorsed way to detect this — rather than relying on connectivity heuristics alone — and is the approach used for this feature.

Two situations, two messages

  1. No internet connection (network down)
    If the phone itself has no wifi/mobile data, the existing "No internet connection" message continues to work as before. Offline features (viewing saved games, playing puzzles offline, etc.) remain available — nothing changes here.

  2. Server down (new outage screen)
    If the phone does have internet, but the Lichess server is unreachable (outage or maintenance), we show a new, friendly outage screen with:

  • The Lichess logo
  • A message explaining that the server is currently unreachable
  • Links to Mastodon, Bluesky and Discord, so the user can check for known outage announcements
  • This screen appears on both the Home tab and the Watch tab.

How is this detected?

We look at the responses the server sends back to requests from the app:

  • A "503" response means: Lichess is in planned maintenance.
  • A "502" response means: the server is unreachable (outage).

As soon as the server responds normally again, or the live connection (websocket) recovers after a real interruption, the outage screen disappears automatically.

Pull-to-refresh

On the outage screen, the user can pull-to-refresh to manually check whether the server is back up — without causing any extra network traffic beyond what the app normally does.

Respecting offline capabilities

This feature follows the same approach already used elsewhere in the app: whatever works offline keeps working, and anything that requires an online connection becomes non-clickable during a server outage (for example, the "Players/friends" and "Challenges" buttons in the top app bar). This avoids sending the user to a screen that wouldn't be able to load anyway, and the confusing error messages that would result from that.

Tests

Automated tests were added/updated for this feature, including checks that:

  • the outage screen appears when the server is down (on both Home and Watch),
  • the "no internet" message keeps working correctly when only the network is down,
  • the "Players" button in the top app bar is disabled during a server outage.

##Known limitation
This implementation does not yet cover the case where only the websocket connection is unavailable (while regular HTTP requests still succeed). In that scenario, the user currently gets no specific message about it. This is something i like to discuss further before deciding on the right approach.

For transparency: Claude helped me with the design and implementation of this feature.

First movie when mobile data / wifi unavailable.

outage-no-data-connection.mp4

Second movie how outage screen is acting.

I tested this by stopping and starting the lila-1 service of my local Docker container. I needed to refresh i few times at the end of the video.

outage-backend-down.mp4

Fixes #1016

@HaonRekcef

Copy link
Copy Markdown
Collaborator

@MaartenD There are offline features in the app, so the lichess is down indicator should be less invasive in the UI.

How do we make sure to distinguish between the server actually being down and the player simply being offline or experiencing network issues?

@MaartenD

Copy link
Copy Markdown
Contributor Author

@HaonRekcef that one i missed. I will do my research and let you know.

@MaartenD

Copy link
Copy Markdown
Contributor Author

@HaonRekcef is the Over the board game an offline feature? Are there more?

@HaonRekcef

Copy link
Copy Markdown
Collaborator

@MaartenD there are multiple. You can disable the network on your device and see which buttons are interactable and not greyed out.

@MaartenD

MaartenD commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

@HaonRekcef what about this version? This is when in flightmode.

Better.messaging.to.communicate.server.outage_ws_with_offline.mp4

In the video you will see that Puzzle Themes is available and when you click on it you can't select anything. I tested this on my Iphone against the production version and there the behavior is the same. In my opinion this isn't correct, but i'm not sure.

Will upload a version when the websocket connection isn't working later today or tomorrow. I need to figure some things out first.

@MaartenD

MaartenD commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

**Behaviour during outage **

Below what is working so far. It's still work in progress but before i continue i would like to have somen answers according my approach (see Question below).

Video when websocket isn't available

Better.messaging.to.communicate.server.outage_ws_with_offline_wsgone.mp4
  • Home tab: outage screen is shown, the Play button remains accessible
  • Puzzles, Learn, More: accessible and functional as normal (offline features work)
  • Watch tab: shows the outage screen instead of "No internet connection" (offline)
  • Play button: The options that require a server connection (such as Create lobby game, Challenge a friend, Correspondence and Arena tournaments) were already disabled when the network was unavailable. By replacing onlineStatusProvider with lichessOnlineProvider in the Play button components, these options are now also correctly disabled during a server outage

A new provider in lib/src/network/lichess_online.dart combines both checks:

final lichessOnlineProvider = Provider.autoDispose<bool>((ref) {
  final isNetworkOnline = ref.watch(onlineStatusProvider).value ?? false;
  final isServerReachable = ref.watch(serverStatusProvider);
  return isNetworkOnline && isServerReachable;
});

Currently applied to play_menu.dart, quick_game_matrix.dart and create_game_widget.dart. There are other places in the codebase that still use onlineStatusProvider directly, these would benefit from the same treatment.

Question: to fully implement this feature i would like to know if you agree with my approach. Additionally, would you prefer lichessOnlineProvider to live in connectivity.dart alongside onlineStatusProvider rather than in a separate file?

Tests added / updated

  • test/network/server_status_test.dart (new): two tests using fakeAsync verifying the 30-second timer fires after continuous disconnection, and that the timer is cancelled when the connection is restored within 30 seconds
  • test/view/home/home_tab_screen_test.dart (updated):
    outage page shown → extended to also verify the Play button (FloatingActionButton) remains visible
    New test: Watch tab shows ServerOutage when offline

@HaonRekcef

Copy link
Copy Markdown
Collaborator

Hi @MaartenD thanks for the work on this!
Correct me if I am wrong, but looking at the code, it seems the "Lichess is undergoing Maintenance" screen will still show up if the user simply loses their local internet connection (at least on the Watch tab).
Actually a local network drop is way more common than an actual server outage.
I think the cleanest way to handle this is by introducing an enum for the connection state rather than relying on booleans:

enum ConnectionStatus {
  online,
  networkDown,
  serverDown,
} 

To answer your question:
Question: to fully implement this feature i would like to know if you agree with my approach. Additionally, would you prefer lichessOnlineProvider to live in connectivity.dart alongside onlineStatusProvider rather than in a separate file?

I am not the final authority, but I would say it doesn't matter much as long as the code is well written and works, personally I would tend towards putting it into the same file.

@MaartenD

Copy link
Copy Markdown
Contributor Author

Hi @HaonRekcef,

Thanks for your reply and great suggestion according to the ConnectionStatus enum. I myself was also leaning towards putting everything in the same file. I will take that route.

@MaartenD

Copy link
Copy Markdown
Contributor Author

Reason for change
The initial implementation used two separate booleans (onlineStatusProvider and serverStatusProvider) to determine whether the app could reach lichess. This made it impossible to distinguish between a local network drop and an actual server outage. Both situations were treated the same way. @HaonRekcef correctly pointed out that showing "Lichess is undergoing technical difficulties" when the user simply has no internet is misleading.

New approach: ConnectionStatus enum
Added to connectivity.dart:

enum ConnectionStatus {
  online,       // network available and server reachable
  networkDown,  // no network connection
  serverDown,   // network available but lichess server unreachable
}

Behaviour per status

Status Home tab Watch tab Play button online options
online normal normal enabled
networkDown normal + offline banner "No internet connection." disabled
serverDown outage screen outage screen disabled

Tests

  • test/network/connectivity_test.dart (new) — three unit tests covering all three enum values:
  1. Network available + server reachable → online
  2. Network unavailable → networkDown
  3. Network available + server unreachable → serverDown
  • test/view/home/home_tab_screen_test.dart (updated):
  1. Watch tab shows no internet message when network is down → verifies "No internet connection." text for networkDown
  2. Watch tab shows outage screen when server is down → verifies ServerOutage widget for serverDown
  • Final implementation of Offline scenario (no internet connection)
Better.messaging.to.communicate.server.outage_ws_with_offline_II.mp4
  • Final implementation of Server outage scenario
Better.messaging.to.communicate.server.outage_ws_with_offline_wsgone_II.mp4

To be clear: given the scope of this PR, I used Claude as an AI assistant during development (final implementation).

@veloce

veloce commented May 19, 2026

Copy link
Copy Markdown
Contributor

@MaartenD I have not read the comments but only the PR description (which I hope you updated based on the last code).

I have not read the code either, but based on the description I don't see how this can work. How do you distinguish a server outage from a network disconnection? Even if the socket is disconnected for more than 30s, that does not mean the lichess WS server is down.

We certainly don't want to display a message indicating that the lichess server is down if that is not the case. And I don't see how you can know that by just monitoring the WS connection.

I am pretty sure this feature cannot be implemented as is, or am I missing something?

I invite you to reach out to the lichess server devs on discord to see how this is implemented in the website.

@MaartenD

Copy link
Copy Markdown
Contributor Author

@veloce i updated the PR description and pushed my latest version of this outage screen feature. I have seen the conflicts and will check that out. Would love to hear what you think about this version.

…-screen

# Conflicts:
#	lib/l10n/app_en.arb
#	lib/l10n/l10n.dart
#	lib/l10n/l10n_af.dart
#	lib/l10n/l10n_ar.dart
#	lib/l10n/l10n_az.dart
#	lib/l10n/l10n_be.dart
#	lib/l10n/l10n_bg.dart
#	lib/l10n/l10n_bn.dart
#	lib/l10n/l10n_bs.dart
#	lib/l10n/l10n_ca.dart
#	lib/l10n/l10n_cs.dart
#	lib/l10n/l10n_da.dart
#	lib/l10n/l10n_de.dart
#	lib/l10n/l10n_el.dart
#	lib/l10n/l10n_en.dart
#	lib/l10n/l10n_eo.dart
#	lib/l10n/l10n_es.dart
#	lib/l10n/l10n_et.dart
#	lib/l10n/l10n_eu.dart
#	lib/l10n/l10n_fa.dart
#	lib/l10n/l10n_fi.dart
#	lib/l10n/l10n_fr.dart
#	lib/l10n/l10n_gl.dart
#	lib/l10n/l10n_gsw.dart
#	lib/l10n/l10n_he.dart
#	lib/l10n/l10n_hi.dart
#	lib/l10n/l10n_hr.dart
#	lib/l10n/l10n_hu.dart
#	lib/l10n/l10n_hy.dart
#	lib/l10n/l10n_id.dart
#	lib/l10n/l10n_it.dart
#	lib/l10n/l10n_ja.dart
#	lib/l10n/l10n_kk.dart
#	lib/l10n/l10n_ko.dart
#	lib/l10n/l10n_lt.dart
#	lib/l10n/l10n_lv.dart
#	lib/l10n/l10n_mk.dart
#	lib/l10n/l10n_nb.dart
#	lib/l10n/l10n_nl.dart
#	lib/l10n/l10n_pl.dart
#	lib/l10n/l10n_pt.dart
#	lib/l10n/l10n_ro.dart
#	lib/l10n/l10n_ru.dart
#	lib/l10n/l10n_sk.dart
#	lib/l10n/l10n_sl.dart
#	lib/l10n/l10n_sq.dart
#	lib/l10n/l10n_sr.dart
#	lib/l10n/l10n_sv.dart
#	lib/l10n/l10n_tr.dart
#	lib/l10n/l10n_uk.dart
#	lib/l10n/l10n_uz.dart
#	lib/l10n/l10n_vi.dart
#	lib/l10n/l10n_zh.dart
#	test/view/home/home_tab_screen_test.dart
#	translation/source/mobile.xml

@veloce veloce left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this and the detailed description.

I made a lot of comments, and some are not just about the code as I have more questions about this feature and how it is handled in the website.

Comment thread lib/src/network/http.dart
responseCode: response.statusCode,
responseDateTime: DateTime.now(),
);
ref

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not put that in the global http client factory. This can be used to create clients that target other URIs than the lichess main server.

There is already a LichessClient, this logic belong here.


final _logger = Logger('ServerStatus');

final serverStatusProvider = NotifierProvider<ServerStatusNotifier, bool>(ServerStatusNotifier.new);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider name is missing.


final serverStatusProvider = NotifierProvider<ServerStatusNotifier, bool>(ServerStatusNotifier.new);

class ServerStatusNotifier extends Notifier<bool> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc comment to explain the purpose of this notifier.

return true;
}

void _onLagChange() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a flaw here, you're checking only the lag change and not whether an http request previously returned an error code.

That being said, I don't think that listening to the socket is the proper thing to do at all.

Lichess down is detected when a frontend server returns a 502/503. There is a separate websocket server that communicates with the lila instance through redis. I assume the WS server can be up when the lichess backend is down, so the connected Websocket does not mean a game can be played.

Thus we should rely only on http status code change from a lichess URI. cc @ornicar and @niklasf , do you confirm?

}, name: 'OnlineStatusProvider');

/// Represents the connection state of the app with respect to the lichess server.
enum ConnectionStatus {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove any ambiguity, it would better be called LichessConnectionStatus.

Text(context.l10n.mobileServerOutageMessage, textAlign: TextAlign.center),
const SizedBox(height: 16),
Text(
context.l10n.mobileServerOutageKeepInformed,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same larger font

: null,
title: Text(context.l10n.openingExplorer),
enabled: isOnline,
enabled: connectionStatus == ConnectionStatus.online,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lichess main server may be down while the opening explorer is still available. So you should really use the regular online status provider.

networkDown,

/// The device is online but the lichess server is unreachable.
serverDown,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no distinction between the server being in planned maintenance and the server being down; we should add it since the status code can tell us that.

children: [
Image.asset(logo, width: 150),
const SizedBox(height: 16),
Text(context.l10n.mobileServerOutageMessage, textAlign: TextAlign.center),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably display a different message if the server is in maintenance mode. No need to translate it for now, especially if there is not translation already available server side.

For the maintenance mode, it would make sense if the http response would contain the datetime when this maintenance is supposed to end.

Now, this would be the ideal. But I don't know whether the website makes this distinction, and if it does not we should probably do the same.
Can you please reach out to the server team and keep me informed on the 2 questions raised here? (maintenance date for 503 and whether to show a different message).

<string name="challengeCreated" comment="Shown as a bottom banner when another player has been challenged.">Challenge created: You will be notified when the game starts.\nYou can access it from the home tab.</string>
<string name="previousPage" comment="Shows the previous page, e.g. in tournament standings">Previous</string>
<string name="orImportPgnFile" comment="Button text to import a PGN file from the device">Or import a PGN file</string>
<string name="serverOutageMessage" comment="Shown on the home screen when the Lichess server is unreachable.">Lichess is undergoing technical difficulties. We&apos;re doing everything we can, and expect to be back up very soon.</string>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked that these translations are not already available server side?

If they are we should use them. If they are not, I'd rather not translate the mobile part yet. See contributing guide for the explanation why we don't translate immediately the new strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better messaging to communicate server outage

3 participants