feat(topics): batch reindex on topic_elements_delete#3423
feat(topics): batch reindex on topic_elements_delete#3423abulte wants to merge 6 commits intoopendatateam:mainfrom
Conversation
ThibaudDauce
left a comment
There was a problem hiding this comment.
Sorry for the late review, I read the code a few days ago but I'm not a big fan of the solution. But I don't have another idea, so… :-D Maybe @maudetes have an opinion/idea?
It doesn't work to delay the task so it's managed by the workers and don't run into memory issues? But maybe the memory errer is before that?
It's weird but ChatGPT says that TopicElement.objects(topic=topic).delete() shouldn't trigger post_delete signals anyway? So maybe the post_delete.disconnect is not useful? If it's the case I'm ok with the implementation without touching the signal handling.
| # Temporarily disconnect post_delete signal to avoid individual reindex tasks | ||
| post_delete.disconnect(TopicElement.post_delete, sender=TopicElement) | ||
| try: | ||
| TopicElement.objects(topic=topic).delete() | ||
| # Process reindexing in batches | ||
| for i in range(0, len(elements_to_reindex), DELETE_REINDEX_BATCH_SIZE): | ||
| batch = elements_to_reindex[i : i + DELETE_REINDEX_BATCH_SIZE] | ||
| batch_reindex.delay(batch) | ||
| finally: | ||
| # Always reconnect the signal | ||
| post_delete.connect(TopicElement.post_delete, sender=TopicElement) | ||
|
|
There was a problem hiding this comment.
Not a big fan of this way of doing. As I understand the Python server, the global state is shared between client? So what happen if someone delete an element from another topic during the reindex of this one? Does the signal triggers?
There was a problem hiding this comment.
Yeah, true, it's process-wide AFAIK. This could create some unwanted side effects.
I had an LLM tell me that too :p But I don't believe it and a "are you sure" made it change its mind 😬 I did not test though. |
In other frameworks I know it's the case, the "DB" operations often do not trigger listeners in models (since models are not in memory for these kinds of operations) |
|
I agree this solution is not great (and limited to some very specific Topics). Maybe there's a deeper fix to be found, such as allowing batch reindexing all the way (up to the search service). |
|
Agreed, I wasn't very hyped by this fix and I think we should investigate the issue more since we've found a way around for topics and it's no more that urgent. |
Your test is working without the disconnect :-) |
The patch path in the test was wrong, my bad. With 44267ac, test now fails when no disconnect. |
|
@abulte Where are we on this problem? |
|
@ThibaudDauce nowhere :-) we didn't reindex a huge topic lately, so the problem is only latent. |
This aims at preventing the error below when deleting all elements from a huge Topic (10k+). It reindexes elements with 500 items batches instead of launching one task per element.