Skip to content

feat: try .zip if .gz missing or mismatches expected checksum for NVD (#5673)#5679

Open
QSchulz wants to merge 2 commits intoossf:mainfrom
QSchulz:dev/nvd-zip
Open

feat: try .zip if .gz missing or mismatches expected checksum for NVD (#5673)#5679
QSchulz wants to merge 2 commits intoossf:mainfrom
QSchulz:dev/nvd-zip

Conversation

@QSchulz
Copy link
Copy Markdown

@QSchulz QSchulz commented Apr 3, 2026

This adds support for downloading and using local .zip NVD year
databases instead of being limited to .gz.

It's happened that the mirror lacked a .gz file for a year but had the
.zip version available. Unfortunately, because we didn't try the .zip
file, cve-bin-tool would simply fail when failing to find the .gz file.

This adds a fallback to a .zip file both for local and remote files.
According to the current zip files in the mirror, it's currently zlib
compressed. If the .gz file is missing, .zip will be attempted. If the
.gz file is corrupted/mismatching the expected checksum, the .zip will
be attempted.

The zipfile.ZipFile interface is quite awkward compared to gzip's as you
need to create an archive first, then write content to a file within
that archive by providing the path within the archive and the content.
Similarly, .read() doesn't read a given amount of data, but the whole
content of the path passed as argument. So, buffered reading is left for
later, if need be.

Fixes #5673
cc @mmind

I have some doubts on test/test_json.py. I'm quite confused by how this works, is it doing a request to NVD? If so, should there be something to adapt to possibly downloading a .zip now?

The diff looks a bit ugly, but looking at it with git log --ignore-all-space helps a bit :)

MR check list:

  1. Have I run the tests locally on at least one version of Python?
    Yes, they don't work even without my patches. I've run the tests I believe were impacted by changes (EXTERNAL_SYSTEM=1 LONG_TESTS=1 python -m pytest -v test/test_database_defaults.py test/test_source_nvd.py). I'll check what the CI says.
  2. Have I run the code linters and fixed any issues they found?
    Yes, they also don't pass without my patches. Whatever was automatically modified by pre-commit was squashed in the patches.
  3. Have I added any tests I need to prove that my code works?
    Yes. Though I would have liked to fake a corrupted gzip we "download" but couldn't figure out how to mock aiohttp.ClientSession (it's an async context manager for which we want a side-effect for the first call but not the second).
  4. Have I added or updated any documentation if I changed or added a feature?
    No, does this need to be documented? I'm not sure where the appropriate place would be, any hint?
  5. Have I used Conventional Commits to format the title of my pull request?
    I hope I've done it correctly :)
  6. If I closed a bug, have I linked it using one of GitHub's keywords? (e.g. include the text fixed )
    Yup, it's there in both PR description and commit log.

QSchulz added 2 commits April 3, 2026 19:22
This allows to not have to care about cleaning up after yourself in an
NVD test. We anyway didn't make use of shared cachedir in this test
class as only test_01_cache_update was something that was making use of
the cache. Being the only test populating and making use of the cache,
this doesn't change anything in behavior for now (aside from creating
an additional NVD_Source() object and associated temporary directory).

This will be useful in the next commit where more tests are added and
files from one test will share the same name as in other tests. This
will allow to avoid test poisoning.

Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
…ossf#5673)

This adds support for downloading and using local .zip NVD year
databases instead of being limited to .gz.

It's happened that the mirror lacked a .gz file for a year but had the
.zip version available. Unfortunately, because we didn't try the .zip
file, cve-bin-tool would simply fail when failing to find the .gz file.

This adds a fallback to a .zip file both for local and remote files.
According to the current zip files in the mirror, it's currently zlib
compressed. If the .gz file is missing, .zip will be attempted. If the
.gz file is corrupted/mismatching the expected checksum, the .zip will
be attempted.

The zipfile.ZipFile interface is quite awkward compared to gzip's as you
need to create an archive first, then write content to a file within
that archive by providing the path within the archive and the content.
Similarly, .read() doesn't read a given amount of data, but the whole
content of the path passed as argument. So, buffered reading is left for
later, if need be.

Fixes ossf#5673.
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: use alternative sources for individual downloads when NVD-mirror is missing files?

1 participant