DO NOT REPORT VULNERABILITIES HERE!
Is there an existing issue for this?
What happened?
Hello!
I think I've identified an issue where it looks like maybe there are multiple scheduler components (?) running at the same time within a long running (> 18 days) primary node? I'm not sure how this is happening, so I thought I would report it in the hopes that you may be able to identify the cause, or tell me what other info would be helpful so that we can figure this out.
The good news is that a restart of the long running primary makes the below symptoms go away. Note that we do run Cronicle as non-root, but I doubt that's related to this issue.
There are at least 4 symptoms:
- Duplicate listings of jobs in the completed log.
- "Maximum of 1 job already running for event" errors for a multiplex "Primary Group" job. The Primary was trying to run 3 instances of the scheduled job on itself, which doesn't allow Concurrency (It is a singleton event). Note that we only see this behaviour for scheduled jobs, not when it is manually invoked.
- Triplicate "Advancing time up to..." log lines, seen in extract at the end of this report.
- I'm pretty sure this issue caused the primary node to crash yesterday, because it tried to fetch logs for the same job from the same worker 3 times, causing an unhandled exception when scheduler components numbers 2 and 3 couldn't find the now deleted log file.
[1769609632.078][2026-01-28 15:13:52][old_primary.example][2129][Cronicle][debug][5][Job log was fetched successfully][https://a-worker.example:8443/api/app/fetch_delete_job_log?path=%2
Fopt%2Fcronicle%2Flogs%2Fjobs%2Fjmkucjcl140.log&auth=1c3cb0xxxxxxxxxxxxxxxxxxxxx]
[1769609632.078][2026-01-28 15:13:52][old_primary.example][2129][Cronicle][debug][5][Job log was fetched successfully][https://a-worker.example:8443/api/app/fetch_delete_job_log?path=%2
Fopt%2Fcronicle%2Flogs%2Fjobs%2Fjmkucjcl140.log&auth=1c3cb0xxxxxxxxxxxxxxxxxxxxx]
[1769609632.084][2026-01-28 15:13:52][old_primary.example][2129][Cronicle][debug][5][Job log was fetched successfully][https://a-worker.example:8443/api/app/fetch_delete_job_log?path=%2
Fopt%2Fcronicle%2Flogs%2Fjobs%2Fjmkucjcl140.log&auth=1c3cb0xxxxxxxxxxxxxxxxxxxxx]
[1769609632.138][2026-01-28 15:13:52][old_primary.example][2129][Cronicle][debug][1][Uncaught Exception: Error: ENOENT: no such file or directory, open 'logs/jobs/jmkucjcl140.log.gz'][Error: ENOENT: no such file or directory, open 'logs/jobs/jmkucjcl140.log.gz']
[1769609632.138][2026-01-28 15:13:52][old_primary.example][2129][Cronicle][debug][1][Emergency Shutdown: Error: ENOENT: no such file or directory, open 'logs/jobs/jmkucjcl140.log.gz'][]
Hopefully the above info will be useful to you identifying a cause, but let me know if you require further info.
Thanks,
Matt
Operating System
RHEL 8
Node.js Version
v18.20.8
Cronicle Version
v0.9.99
Server Setup
Multi-Primary with Workers
Storage Setup
NFS Filesystem
Relevant log output
[1769561700.051][2026-01-28 01:55:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:55:00][]
[1769561700.052][2026-01-28 01:55:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:55:00][]
[1769561700.052][2026-01-28 01:55:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:55:00][]
[1769561760.003][2026-01-28 01:56:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:56:00][]
[1769561760.003][2026-01-28 01:56:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:56:00][]
[1769561760.004][2026-01-28 01:56:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:56:00][]
[1769561820.086][2026-01-28 01:57:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:57:00][]
[1769561820.087][2026-01-28 01:57:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:57:00][]
[1769561820.087][2026-01-28 01:57:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:57:00][]
[1769561880.087][2026-01-28 01:58:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:58:00][]
[1769561880.087][2026-01-28 01:58:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:58:00][]
[1769561880.087][2026-01-28 01:58:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:58:00][]
[1769561940.072][2026-01-28 01:59:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:59:00][]
[1769561940.073][2026-01-28 01:59:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:59:00][]
[1769561940.073][2026-01-28 01:59:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 01:59:00][]
[1769562000.023][2026-01-28 02:00:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 02:00:00][]
[1769562000.023][2026-01-28 02:00:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 02:00:00][]
[1769562000.023][2026-01-28 02:00:00][old_primary.example][2129][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2026/01/28 02:00:00][]
Code of Conduct
DO NOT REPORT VULNERABILITIES HERE!
Is there an existing issue for this?
What happened?
Hello!
I think I've identified an issue where it looks like maybe there are multiple scheduler components (?) running at the same time within a long running (> 18 days) primary node? I'm not sure how this is happening, so I thought I would report it in the hopes that you may be able to identify the cause, or tell me what other info would be helpful so that we can figure this out.
The good news is that a restart of the long running primary makes the below symptoms go away. Note that we do run Cronicle as non-root, but I doubt that's related to this issue.
There are at least 4 symptoms:
Hopefully the above info will be useful to you identifying a cause, but let me know if you require further info.
Thanks,
Matt
Operating System
RHEL 8
Node.js Version
v18.20.8
Cronicle Version
v0.9.99
Server Setup
Multi-Primary with Workers
Storage Setup
NFS Filesystem
Relevant log output
Code of Conduct