Skip to content

Continue validation and count invalid records #2370#2371

Merged
dr0i merged 6 commits into
masterfrom
2370-continueValidationProcess
Jun 29, 2026
Merged

Continue validation and count invalid records #2370#2371
dr0i merged 6 commits into
masterfrom
2370-continueValidationProcess

Conversation

@TobiasNx

Copy link
Copy Markdown
Contributor

The validation should continue and count the number of invalid file.

@TobiasNx TobiasNx requested a review from dr0i June 18, 2026 14:58
Comment thread web/scripts/getUpdatesDumpJsonl.sh Outdated
@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 19, 2026
@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 19, 2026
@TobiasNx TobiasNx requested a review from dr0i June 19, 2026 08:52
Comment thread web/scripts/getUpdatesDumpJsonl.sh Outdated
@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 19, 2026
@dr0i dr0i removed their assignment Jun 19, 2026
@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 19, 2026
@dr0i dr0i self-requested a review June 19, 2026 11:48
@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 19, 2026
Comment thread web/scripts/getUpdatesDumpJsonl.sh Outdated
Comment thread web/scripts/getUpdatesDumpJsonl.sh Outdated
As suggested by @blackwinter

Co-authored-by: Jens Wille <jens.wille@hbz-nrw.de>
@TobiasNx

Copy link
Copy Markdown
Contributor Author

@dr0i do you want to have another look?

@dr0i

dr0i commented Jun 23, 2026

Copy link
Copy Markdown
Member

I would test and apply @blackwinter suggestions if it was my code in the first plac. Or am I mistaken?

Co-authored-by: Jens Wille <jens.wille@hbz-nrw.de>
@TobiasNx

Copy link
Copy Markdown
Contributor Author

@dr0i thanks I missed the suggestion from @blackwinter but how can I test this locally?

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 24, 2026
@TobiasNx TobiasNx linked an issue Jun 24, 2026 that may be closed by this pull request
@dr0i

dr0i commented Jun 29, 2026

Copy link
Copy Markdown
Member

Test it by adapting like:

UPDATES_FNAME=2026-06-15_to_2026-06-16_lobid-resources-updates.jsonl.gz

# validate schema
cd ~/git/lobid-resources/src/test/resources/schemas

jsonschema validate resource.json ${UPDATES_FNAME} --continue > /tmp/jsonschemaValidationOutput.log 2>&1

if [ -s /tmp/jsonschemaValidationOutput.log ]; then
        NUMBER_OF_INVALID_RECORDS=$(grep -c "fail:" /tmp/jsonschemaValidationOutput.log)
        echo $NUMBER_OF_INVALID_RECORDS
        head -n5 /tmp/jsonschemaValidationOutput.log
fi

I've used a dump which was reported as flawed (see mails) and downloaded it from https://lobid.org/download/dumps/lobid-resources/.
(I've probably an old version of jsonschema locally because it outputs:

/usr/bin/jsonschema:5: DeprecationWarning: The jsonschema CLI is deprecated and will be removed in a future version. Please use check-jsonschema instead, which can be installed from https://pypi.org/project/check-jsonschema/ )

Tested this then on server.
Remarkably, it found only 1 error (don't know if this is coincidence or just bailing out and thus not working as expected).
However, we may just deploy the script and see what happens in the days to come.

Note also that head -n5 results in just a tiny bit of information - the ressource is not dumped in one line but is pretty printed, so we have "one field, one line". Thus I will increase this to 333 and, as usually, will adapt this when we see results.

@dr0i dr0i merged commit bf6f748 into master Jun 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Log all validation errors from updates

3 participants