Worker jobs seem to fail every so often because of transient request errors, but we currently have no logging or visibility of any sort into worker behaviour.
Some metrics to collect per worker:
- Number of succeeded jobs
- Number of failed jobs
- Mean time to job completion (possibly segmented by succeeded or failed)
Some things to log:
- Incremental update progress (pages loaded, etc.)
Worker jobs seem to fail every so often because of transient request errors, but we currently have no logging or visibility of any sort into worker behaviour.
Some metrics to collect per worker:
Some things to log: