Skip to content

[Bug]: Syndic fails to reconnect after Master of Masters restart (ZeroMQ transport) #68915

@wibbit

Description

@wibbit

What happened?

Description

When a Master of Masters (MoM) restarts, syndics connected via ZeroMQ fail to detect the restart and continue operating with stale connections. ZMQ silently auto-reconnects at the socket level (via RECONNECT_IVL), but Salt's higher-level syndic logic is never notified, leaving the syndic in a broken state where it believes it is connected but the MoM has no record of it.

This results in loss of visibility for all minions behind the affected syndics.

Steps to Reproduce

  1. Set up a Master of Masters with one or more syndics using ZeroMQ transport
  2. Verify minions behind syndics are visible (salt '*' test.ping)
  3. Restart the MoM
  4. Observe that syndics do not reconnect — minions behind them become unreachable

Expected Behavior

Syndics should detect the MoM restart and automatically reconnect, re-authenticating and re-establishing event forwarding.

Actual Behavior

Syndics remain in a stale connected state indefinitely. The ZMQ socket reconnects at the transport layer, but Salt never triggers its reconnection logic.

Root Cause

Four interrelated bugs:

  1. ZMQ transport silent reconnection: PublishClient does not fire connect_callback/disconnect_callback after initial connection. ZMQ's internal reconnect goes unnoticed by Salt.

  2. SyndicManager ignores __master_disconnected: The regular Minion class handles __master_disconnected events to trigger reconnection, but SyndicManager._process_event() does not.

  3. Syndic.reconnect() skips auth invalidation: Unlike Minion.connect_master(), Syndic.reconnect() does not call auth.invalidate() before reconnecting, causing stale auth tokens to be reused.

  4. _call_syndic() ignores False returns: _fire_master() returns False on SaltReqTimeoutError, but _call_syndic() does not check the return value — timeout failures are silently treated as successes.

Versions Affected

  • 3006.x
  • 3007.x
  • master

Type of salt install

Official deb

Major version

3006.x

What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)

ubuntu-24.04

salt --versions-report output

Salt Version:
               Salt: 3006.18
 
Python Version:
             Python: 3.10.19 (main, Dec 16 2025, 10:12:17) [GCC 11.2.0]
 
Dependency Versions:
               cffi: 2.0.0
           cherrypy: 18.10.0
       cryptography: 42.0.5
           dateutil: 2.8.1
          docker-py: 7.1.0
              gitdb: Not Installed
          gitpython: Not Installed
             Jinja2: 3.1.6
            libgit2: 1.6.4
       looseversion: 1.0.2
           M2Crypto: 0.39.0
               Mako: Not Installed
            msgpack: 1.0.2
       msgpack-pure: Not Installed
       mysql-python: Not Installed
          packaging: 24.0
          pycparser: 2.21
           pycrypto: Not Installed
       pycryptodome: 3.19.1
             pygit2: 1.12.2
       python-gnupg: 0.4.8
             PyYAML: 6.0.1
              PyZMQ: 23.2.0
             relenv: 0.22.1
              smmap: Not Installed
            timelib: 0.3.0
            Tornado: 4.5.3
                ZMQ: 4.3.4
 
Salt Extensions:
 saltext.prometheus: 2.2.0
 
System Versions:
               dist: ubuntu 24.04.2 noble
             locale: utf-8
            machine: x86_64
            release: 6.11.0-19-generic
             system: Linux
            version: Ubuntu 24.04.2 noble

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugbroken, incorrect, or confusing behaviorneeds-triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions