What happened?
Description
When a Master of Masters (MoM) restarts, syndics connected via ZeroMQ fail to detect the restart and continue operating with stale connections. ZMQ silently auto-reconnects at the socket level (via RECONNECT_IVL), but Salt's higher-level syndic logic is never notified, leaving the syndic in a broken state where it believes it is connected but the MoM has no record of it.
This results in loss of visibility for all minions behind the affected syndics.
Steps to Reproduce
- Set up a Master of Masters with one or more syndics using ZeroMQ transport
- Verify minions behind syndics are visible (
salt '*' test.ping)
- Restart the MoM
- Observe that syndics do not reconnect — minions behind them become unreachable
Expected Behavior
Syndics should detect the MoM restart and automatically reconnect, re-authenticating and re-establishing event forwarding.
Actual Behavior
Syndics remain in a stale connected state indefinitely. The ZMQ socket reconnects at the transport layer, but Salt never triggers its reconnection logic.
Root Cause
Four interrelated bugs:
-
ZMQ transport silent reconnection: PublishClient does not fire connect_callback/disconnect_callback after initial connection. ZMQ's internal reconnect goes unnoticed by Salt.
-
SyndicManager ignores __master_disconnected: The regular Minion class handles __master_disconnected events to trigger reconnection, but SyndicManager._process_event() does not.
-
Syndic.reconnect() skips auth invalidation: Unlike Minion.connect_master(), Syndic.reconnect() does not call auth.invalidate() before reconnecting, causing stale auth tokens to be reused.
-
_call_syndic() ignores False returns: _fire_master() returns False on SaltReqTimeoutError, but _call_syndic() does not check the return value — timeout failures are silently treated as successes.
Versions Affected
Type of salt install
Official deb
Major version
3006.x
What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)
ubuntu-24.04
salt --versions-report output
Salt Version:
Salt: 3006.18
Python Version:
Python: 3.10.19 (main, Dec 16 2025, 10:12:17) [GCC 11.2.0]
Dependency Versions:
cffi: 2.0.0
cherrypy: 18.10.0
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: 7.1.0
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: 1.6.4
looseversion: 1.0.2
M2Crypto: 0.39.0
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 24.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: 1.12.2
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.22.1
smmap: Not Installed
timelib: 0.3.0
Tornado: 4.5.3
ZMQ: 4.3.4
Salt Extensions:
saltext.prometheus: 2.2.0
System Versions:
dist: ubuntu 24.04.2 noble
locale: utf-8
machine: x86_64
release: 6.11.0-19-generic
system: Linux
version: Ubuntu 24.04.2 noble
What happened?
Description
When a Master of Masters (MoM) restarts, syndics connected via ZeroMQ fail to detect the restart and continue operating with stale connections. ZMQ silently auto-reconnects at the socket level (via
RECONNECT_IVL), but Salt's higher-level syndic logic is never notified, leaving the syndic in a broken state where it believes it is connected but the MoM has no record of it.This results in loss of visibility for all minions behind the affected syndics.
Steps to Reproduce
salt '*' test.ping)Expected Behavior
Syndics should detect the MoM restart and automatically reconnect, re-authenticating and re-establishing event forwarding.
Actual Behavior
Syndics remain in a stale connected state indefinitely. The ZMQ socket reconnects at the transport layer, but Salt never triggers its reconnection logic.
Root Cause
Four interrelated bugs:
ZMQ transport silent reconnection:
PublishClientdoes not fireconnect_callback/disconnect_callbackafter initial connection. ZMQ's internal reconnect goes unnoticed by Salt.SyndicManager ignores
__master_disconnected: The regularMinionclass handles__master_disconnectedevents to trigger reconnection, butSyndicManager._process_event()does not.Syndic.reconnect()skips auth invalidation: UnlikeMinion.connect_master(),Syndic.reconnect()does not callauth.invalidate()before reconnecting, causing stale auth tokens to be reused._call_syndic()ignores False returns:_fire_master()returnsFalseonSaltReqTimeoutError, but_call_syndic()does not check the return value — timeout failures are silently treated as successes.Versions Affected
Type of salt install
Official deb
Major version
3006.x
What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)
ubuntu-24.04
salt --versions-report output
Salt Version: Salt: 3006.18 Python Version: Python: 3.10.19 (main, Dec 16 2025, 10:12:17) [GCC 11.2.0] Dependency Versions: cffi: 2.0.0 cherrypy: 18.10.0 cryptography: 42.0.5 dateutil: 2.8.1 docker-py: 7.1.0 gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.6 libgit2: 1.6.4 looseversion: 1.0.2 M2Crypto: 0.39.0 Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 24.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: 1.12.2 python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 23.2.0 relenv: 0.22.1 smmap: Not Installed timelib: 0.3.0 Tornado: 4.5.3 ZMQ: 4.3.4 Salt Extensions: saltext.prometheus: 2.2.0 System Versions: dist: ubuntu 24.04.2 noble locale: utf-8 machine: x86_64 release: 6.11.0-19-generic system: Linux version: Ubuntu 24.04.2 noble