Skip to content

Additional metrics exported from Celery workers#3463

Closed
rbagd wants to merge 4 commits intoopen-telemetry:mainfrom
rbagd:include_more_celery_metrics
Closed

Additional metrics exported from Celery workers#3463
rbagd wants to merge 4 commits intoopen-telemetry:mainfrom
rbagd:include_more_celery_metrics

Conversation

@rbagd
Copy link
Copy Markdown
Contributor

@rbagd rbagd commented May 5, 2025

Description

This PR includes an enhancement regarding metrics exported from Celery workers and implements these measurements

  • Prefetched task up-down counter (number of tasks queued locally in the worker) and time spent in prefetch mode
  • Active task up-down counter and task processing duration

Currently, there's only one metric being exported - flower.task.runtime.seconds, and it was renamed to follow semantic conventions (reported in the changelog).

Changes in this PR also fix a memory leak present in current instrumentation.

Fixes #3458

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

  • Unit tests added for all newly included and updated metrics

Does This PR Require a Core Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added

@rbagd rbagd requested a review from a team as a code owner May 5, 2025 10:31
@rbagd rbagd force-pushed the include_more_celery_metrics branch 4 times, most recently from 3755a64 to 8867346 Compare May 5, 2025 11:40
@rbagd rbagd force-pushed the include_more_celery_metrics branch 2 times, most recently from 09038ae to 2d192ca Compare May 13, 2025 09:40
@rbagd rbagd force-pushed the include_more_celery_metrics branch from 2d192ca to 63b559b Compare May 22, 2025 10:02
@rbagd
Copy link
Copy Markdown
Contributor Author

rbagd commented May 22, 2025

@xrmx @emdneto May I ask one of you to give a first look at this? I understand there's nobody who owns Celery component, so I'm afraid this will drift into oblivion.

@rbagd rbagd force-pushed the include_more_celery_metrics branch from 2221552 to e9e80e1 Compare May 26, 2025 07:05
@bschoenmaeckers
Copy link
Copy Markdown
Contributor

bschoenmaeckers commented May 28, 2025

It would also be nice to have task result counters, like the following

  • total finished tasks
  • total sucessfull tasks
  • total failed tasks
  • total retried tasks

@rbagd rbagd force-pushed the include_more_celery_metrics branch from 99db9da to a417078 Compare June 11, 2025 11:08
@rbagd
Copy link
Copy Markdown
Contributor Author

rbagd commented Jun 23, 2025

It would also be nice to have task result counters, like the following

* total finished tasks

* total sucessfull tasks

* total failed tasks

* total retried tasks

Agreed that it'd be nice to have. I don't want to increase the scope of this MR as it's already quite large but I can look into this in the near future.

@rbagd rbagd force-pushed the include_more_celery_metrics branch from e3a2c0f to c42bbb8 Compare June 23, 2025 12:24
@xrmx xrmx moved this to Ready for review in Python PR digest Jun 23, 2025
_TASK_NAME_KEY = "celery.task_name"

# Metric names
_TASK_COUNT_ACTIVE = "messaging.client.active_tasks"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 new metrics are not in semantic conventions right? I don't think we should add them to the same namespace as the others. If adding unspecified metrics at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to summarize my viewpoint

# Already in semconv
* _TASK_PROCESSING_TIME = messaging_metrics.MESSAGING_PROCESS_DURATION

# Probably should be in semconv
* _TASK_COUNT_ACTIVE = "messaging.client.active_tasks"

# Debatable as this is is more linked to Celery
* _TASK_COUNT_PREFETCHED = "messaging.client.prefetched_tasks"
* _TASK_PREFETCH_TIME = "messaging.prefetch.duration"

By different namespace, do you mean replacing messaging by something like celery for the last two (or three) of these?


### Fixed

- `opentelemetry-instrumentation-celery` Fix a memory leak where a reference to a task identifier is kept indefinitely
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extract this change in another PR so we can get this reviewed and hopefully merged before next release?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the same. The memory leak seems more feasible to review and merge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened this small PR to just fix the leak.


def create_celery_metrics(self, meter) -> None:
self.metrics = {
"flower.task.runtime.seconds": meter.create_histogram(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to discuss if we can keep this and the new metric to avoid breakage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine of course to keep this for backwards compatibility, albeit not sure it's worth it in this case. Metric name has nothing to do with OTel or semantic conventions, and references an unrelated project. :/

@rbagd
Copy link
Copy Markdown
Contributor Author

rbagd commented Aug 13, 2025

Sorry for delays - summer time and all that. I'll go through the review comments.

@rbagd rbagd force-pushed the include_more_celery_metrics branch from 0ef3755 to d9f3fee Compare August 13, 2025 07:36
@rbagd rbagd force-pushed the include_more_celery_metrics branch from d9f3fee to 1df9e8d Compare August 13, 2025 07:39
@diogosilva30
Copy link
Copy Markdown
Contributor

What's the status of this MR? @rbagd
Need help?

@github-actions
Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 days of this comment.
If you're still working on this, please add a comment or push new commits.

@github-actions github-actions bot added the Stale label Mar 13, 2026
@github-actions
Copy link
Copy Markdown

This PR has been closed due to inactivity. Please reopen if you would like to continue working on it.

@github-actions github-actions bot closed this Mar 27, 2026
@github-project-automation github-project-automation bot moved this from Ready for review to Done in Python PR digest Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Improve metrics in Celery instrumentation

5 participants