Skip to content

feat(topics): batch reindex on topic_elements_delete#3423

Open
abulte wants to merge 6 commits intoopendatateam:mainfrom
ecolabdata:batch-reindex-elements-delete
Open

feat(topics): batch reindex on topic_elements_delete#3423
abulte wants to merge 6 commits intoopendatateam:mainfrom
ecolabdata:batch-reindex-elements-delete

Conversation

@abulte
Copy link
Copy Markdown
Contributor

@abulte abulte commented Sep 11, 2025

This aims at preventing the error below when deleting all elements from a huge Topic (10k+). It reindexes elements with 500 items batches instead of launching one task per element.

Traceback (most recent call last):
  File "/srv/demo/lib/python3.11/site-packages/flask/app.py", line 1523, in full_dispatch_request
  File "/srv/demo/lib/python3.11/site-packages/flask/app.py", line 1509, in dispatch_request
  File "/srv/demo/lib/python3.11/site-packages/udata/api/__init__.py", line 117, in wrapper
  File "/srv/demo/lib/python3.11/site-packages/flask_restx/api.py", line 402, in wrapper
  File "/srv/demo/lib/python3.11/site-packages/flask/views.py", line 84, in view
    return current_app.ensure_sync(self.dispatch_request)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/demo/lib/python3.11/site-packages/flask_restx/resource.py", line 41, in dispatch_request
    resp = meth(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/srv/demo/lib/python3.11/site-packages/udata/api/__init__.py", line 91, in wrapper
  File "/srv/demo/lib/python3.11/site-packages/udata/core/topic/apiv2.py", line 160, in delete
    topic.elements.delete()
  File "/srv/demo/lib/python3.11/site-packages/mongoengine/queryset/base.py", line 474, in delete
  File "/srv/demo/lib/python3.11/site-packages/mongoengine/document.py", line 693, in delete
  File "/srv/demo/lib/python3.11/site-packages/blinker/base.py", line 263, in send
  File "/srv/demo/lib/python3.11/site-packages/blinker/base.py", line 263, in <listcomp>
  File "/srv/demo/lib/python3.11/site-packages/udata/core/topic/models.py", line 44, in post_delete
  File "/srv/demo/lib/python3.11/site-packages/celery/app/task.py", line 444, in delay
  File "/srv/demo/lib/python3.11/site-packages/sentry_sdk/integrations/celery/__init__.py", line 289, in apply_async
  File "/srv/demo/lib/python3.11/site-packages/celery/app/task.py", line 594, in apply_async
  File "/srv/demo/lib/python3.11/site-packages/sentry_sdk/integrations/celery/__init__.py", line 289, in apply_async
  File "/srv/demo/lib/python3.11/site-packages/celery/app/base.py", line 801, in send_task
  File "/srv/demo/lib/python3.11/site-packages/celery/app/amqp.py", line 518, in send_task_message
  File "/srv/demo/lib/python3.11/site-packages/sentry_sdk/utils.py", line 1783, in runner
  File "/srv/demo/lib/python3.11/site-packages/sentry_sdk/integrations/celery/__init__.py", line 526, in sentry_publish
  File "/srv/demo/lib/python3.11/site-packages/kombu/messaging.py", line 190, in publish
  File "/srv/demo/lib/python3.11/site-packages/kombu/connection.py", line 556, in _ensured
  File "/srv/demo/lib/python3.11/site-packages/kombu/messaging.py", line 214, in _publish
  File "/srv/demo/lib/python3.11/site-packages/kombu/transport/virtual/base.py", line 610, in basic_publish
  File "/srv/demo/lib/python3.11/site-packages/kombu/transport/virtual/exchange.py", line 105, in deliver
  File "/srv/demo/lib/python3.11/site-packages/kombu/transport/virtual/base.py", line 722, in _lookup
  File "/srv/demo/lib/python3.11/site-packages/kombu/transport/redis.py", line 1071, in get_table
MemoryError

Copy link
Copy Markdown
Contributor

@ThibaudDauce ThibaudDauce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, I read the code a few days ago but I'm not a big fan of the solution. But I don't have another idea, so… :-D Maybe @maudetes have an opinion/idea?

It doesn't work to delay the task so it's managed by the workers and don't run into memory issues? But maybe the memory errer is before that?

It's weird but ChatGPT says that TopicElement.objects(topic=topic).delete() shouldn't trigger post_delete signals anyway? So maybe the post_delete.disconnect is not useful? If it's the case I'm ok with the implementation without touching the signal handling.

Comment thread udata/core/topic/apiv2.py
Comment on lines +171 to 182
# Temporarily disconnect post_delete signal to avoid individual reindex tasks
post_delete.disconnect(TopicElement.post_delete, sender=TopicElement)
try:
TopicElement.objects(topic=topic).delete()
# Process reindexing in batches
for i in range(0, len(elements_to_reindex), DELETE_REINDEX_BATCH_SIZE):
batch = elements_to_reindex[i : i + DELETE_REINDEX_BATCH_SIZE]
batch_reindex.delay(batch)
finally:
# Always reconnect the signal
post_delete.connect(TopicElement.post_delete, sender=TopicElement)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of this way of doing. As I understand the Python server, the global state is shared between client? So what happen if someone delete an element from another topic during the reindex of this one? Does the signal triggers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true, it's process-wide AFAIK. This could create some unwanted side effects.

@abulte
Copy link
Copy Markdown
Contributor Author

abulte commented Sep 15, 2025

It's weird but ChatGPT says that TopicElement.objects(topic=topic).delete() shouldn't trigger post_delete signals anyway?

I had an LLM tell me that too :p But I don't believe it and a "are you sure" made it change its mind 😬 I did not test though.

@ThibaudDauce
Copy link
Copy Markdown
Contributor

ThibaudDauce commented Sep 15, 2025

It's weird but ChatGPT says that TopicElement.objects(topic=topic).delete() shouldn't trigger post_delete signals anyway?

I had an LLM tell me that too :p But I don't believe it and a "are you sure" made it change its mind 😬 I did not test though.

In other frameworks I know it's the case, the "DB" operations often do not trigger listeners in models (since models are not in memory for these kinds of operations)

@abulte
Copy link
Copy Markdown
Contributor Author

abulte commented Sep 15, 2025

I agree this solution is not great (and limited to some very specific Topics). Maybe there's a deeper fix to be found, such as allowing batch reindexing all the way (up to the search service).

@maudetes
Copy link
Copy Markdown
Contributor

Agreed, I wasn't very hyped by this fix and I think we should investigate the issue more since we've found a way around for topics and it's no more that urgent.

@ThibaudDauce
Copy link
Copy Markdown
Contributor

I agree this solution is not great (and limited to some very specific Topics). Maybe there's a deeper fix to be found, such as allowing batch reindexing all the way (up to the search service).

Your test is working without the disconnect :-)

@abulte
Copy link
Copy Markdown
Contributor Author

abulte commented Sep 16, 2025

I agree this solution is not great (and limited to some very specific Topics). Maybe there's a deeper fix to be found, such as allowing batch reindexing all the way (up to the search service).

Your test is working without the disconnect :-)

The patch path in the test was wrong, my bad. With 44267ac, test now fails when no disconnect.

@ThibaudDauce ThibaudDauce changed the title feat: batch reindex on topic_elements_delete feat(topics): batch reindex on topic_elements_delete Oct 2, 2025
@ThibaudDauce
Copy link
Copy Markdown
Contributor

@abulte Where are we on this problem?

@abulte
Copy link
Copy Markdown
Contributor Author

abulte commented Nov 27, 2025

@ThibaudDauce nowhere :-) we didn't reindex a huge topic lately, so the problem is only latent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants