Skip to content

2357 deletions#2367

Draft
TobiasNx wants to merge 8 commits into
masterfrom
2357-deletions
Draft

2357 deletions#2367
TobiasNx wants to merge 8 commits into
masterfrom
2357-deletions

Conversation

@TobiasNx

Copy link
Copy Markdown
Contributor

@dr0i would it be enough to add a prefix to the title in order to delete the records?

@TobiasNx TobiasNx requested a review from dr0i June 18, 2026 10:19
@dr0i

dr0i commented Jun 18, 2026

Copy link
Copy Markdown
Member

A prefix in a title field looks like a dirty workaround. If you can guarantee that this prefix will not be a part of a regular title, it would be ok, though. Also it comes with a bit of an extra resource effort (filtering 30 M docs using a simple regex).
Could you delete all fields, beside the id, from that resource and have not just a prefix but a solely title like "DELETED from lobid-resources" ? But then would the schema not be validate, I suppose :/.

@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 18, 2026

@dr0i dr0i left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok. It's also good to not put too much effort into this issue, so 👍

@blackwinter

Copy link
Copy Markdown
Member

As long as you don't switch the index alias until the deletions have been removed, I guess... Unfortunately, org.metafacture.elasticsearch.JsonToElasticsearchBulk doesn't support the delete action yet, which would certainly be the cleaner option.

@TobiasNx

TobiasNx commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@dr0i I adjusted as you suggested. Was a quick adjustment. What do you think about @blackwinter s comment.

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 18, 2026
@dr0i

dr0i commented Jun 18, 2026

Copy link
Copy Markdown
Member

As long as you don't switch the index alias until the deletions have been removed,

does it really hurt if for some minutes there would be titles with the "DELETED"-Prefix in production index @blackwinter ?

@blackwinter

Copy link
Copy Markdown
Member

does it really hurt if for some minutes there would be titles with the "DELETED"-Prefix in production index @blackwinter ?

Yes, I would consider it a serious issue. If, however, the records only contain minimal data (such as in c432b9d), it's less relevant because they won't come up in a casual search. Another option would be to include an internal field which can also be queried efficiently. I'm only worried about the visible effects in the union catalogue.

(And I only noticed after my previous comment that these modified records are immediately live during update indexing; the index switch only happens for the baseline indexing, of course.)

@TobiasNx

Copy link
Copy Markdown
Contributor Author

@dr0i you could continue with an apporach to get rid of these records

@dr0i

dr0i commented Jun 18, 2026

Copy link
Copy Markdown
Member

So @TobiasNx if it's easily possible to just leave out as much fields as possible (i.e. also title) , we could do an must_not: exist (id) query (if its possible to even let out the id field).

@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 18, 2026
@TobiasNx

TobiasNx commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

I reduced the number of elements as much as the schema allows for (@context,id,type) and as much the workflow allows for (almaMmsId was necessary otherwise:

o.l.r.AlmaMarc21XmlToLobidJsonMetafixTest - Errored when transforming 
java.lang.NullPointerException: Cannot invoke "Object.toString()" because "value" is null
        at org.lobid.resources.JsonToElasticsearchBulkMap.findId(JsonToElasticsearchBulkMap.java:129)

)

Additionally I added the delete title.

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 18, 2026
@TobiasNx TobiasNx linked an issue Jun 19, 2026 that may be closed by this pull request
dr0i added a commit that referenced this pull request Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include daily deletions?

3 participants