Skip to content

feat: JVM release file parsing#723

Merged
kateeselius merged 3 commits intomainfrom
CN-444-support-jvm-vulnerability-detection
Apr 7, 2026
Merged

feat: JVM release file parsing#723
kateeselius merged 3 commits intomainfrom
CN-444-support-jvm-vulnerability-detection

Conversation

@kateeselius
Copy link
Copy Markdown
Contributor

@kateeselius kateeselius commented Nov 5, 2025

Goal: Expand support for JVM vuln scanning by parsing the release file.

Jira Ticket:

Discovered it is possible to parse the release file generated when Java is loaded in an image.

An example release file:

IMPLEMENTOR="Eclipse Adoptium"
IMPLEMENTOR_VERSION="Temurin-11.0.28+6"
JAVA_RUNTIME_VERSION="11.0.28+6"
JAVA_VERSION="11.0.28"
JAVA_VERSION_DATE="2025-07-15"
LIBC="gnu"
MODULES="java.base java.compiler java.datatransfer java.xml java.prefs ... etc"
..... other fields ...... 

This PR extracts the Java version from the release file and sends the version to registry to be scanned for vulns.

We have a separate scan result, keyBinariesHashes, that discovers java binaries within the image and will send a hash of the binary to registry. In registry, these hashes are attempted to be resolved to a package (named 'openjdk-jre') and version with a call to hash-lookup. A depGraph is generated from the matched java packages, with the package manager is "upstream". These naming conventions are set in order to match vulns with our db.

In registry, the extracted java version will be added to the keyBinariesHashes flow to be added to the depGraph and scanned for vulnerabilities. Since package names in the depGraph need to be unique, there can only be one 'openjdk-jre' package within the tree. This restraint means that we only add the java version parsed from the release file if there were no successfully matched hash-lookup binaries. In other words, this feature only supports one version of Java to be scanned.

Overview of Changes Within Snyk-docker-plugin:

  1. Add JavaRuntimeMetadata Structure definitions.
  2. Extraction Layer - create new extract action that is looking for the release file when scanning the image.
  3. Analysis Layer - Parse the contents of the release file to extract the Java version.
  4. Response Building - add the javaRuntimeMetadata fact to the first scanResult (handled not as a separate application scan, but part of the OS-level scan alongside keyBinariesHashes)

@kateeselius kateeselius changed the title WIP - feat: POC support from JVM release file parsing feat: JVM release file parsing Nov 16, 2025
Copy link
Copy Markdown
Contributor

@bdemeo12 bdemeo12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great !

One tiny general nit, there are a lot of comments in this PR. You could move the final design, and reasons as to why you chose to do xyz into a confluence doc, and clean up a lot of these comments!

@kateeselius kateeselius marked this pull request as ready for review January 8, 2026 20:37
@kateeselius kateeselius requested a review from a team as a code owner January 8, 2026 20:37
@kateeselius kateeselius requested review from adrobuta, bdemeo12, bgardiner and tyler-catlin and removed request for tyler-catlin January 8, 2026 20:37
adrobuta
adrobuta previously approved these changes Jan 23, 2026
Copy link
Copy Markdown
Contributor

@bgardiner bgardiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. a few questions within about the implementation

if (!content || content.trim().length === 0) {
return null;
}
if (content.length > MAX_CONTENT_LENGTH) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you determine MAX_CONTENT_LENGTH and MAX_LINE_COUNT?

Is the objective of these checks to mitigate dealing with large files that don't happen to be release files? If so, would it be better to use getContentAsBuffer within getJavaRuntimeReleaseContent instead of getContentAsString (i.e., only load up to MAX_CONTENT_LENGTH into memory)? Maybe not... just trying to understand your reasoning,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - my original intention was to block reading really long files in that are not actually release files. We don't however do that for our other parsers within the repo so it might be best to leave out. I agree it would make sense to enforce this when we are reading the file into a buffer rather than a post-process check. But I'm leaning towards removing these constraints altogether since it is possible for the java modules list to be quite long. It would be hard to estimate reasonable values for MAX_CONTENT_LENGTH and MAX_LINE_COUNT without making too many assumptions about what the customers might have in the release file. 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, streamStoString has a streamSize property, but it appears to be completely ignored:

streamSize?: number,
. Ideally that function would respect it, similar to the size limit in streamToJson:
export async function streamToJson<T>(stream: Readable): Promise<T> {
return new Promise<T>((resolve, reject) => {
const chunks: string[] = [];
let bytes = 0;
stream.on("end", () => {
try {
resolve(JSON.parse(chunks.join("")));
} catch (error) {
reject(error);
}
});
stream.on("error", (error) => reject(error));
stream.on("data", (chunk) => {
bytes += chunk.length;
if (bytes <= 2 * MEGABYTE) {
chunks.push(chunk.toString("utf8"));
} else {
reject(new Error("The stream is too large to parse as JSON"));
}
});
});
}
.

Ideally we would be more defensive here and not read a potentially unbounded file contents into memory, but it's not unprecedented in the repo for other static file analysis (like the os package databases), so we can just keep it for now. However, I think it might be good to create a ticket to track this improvement (not just for release file parsing but other instances of file parsing).

@kateeselius
Copy link
Copy Markdown
Contributor Author

TODO: update test snapshots as well

@kateeselius kateeselius force-pushed the CN-444-support-jvm-vulnerability-detection branch from 92c7007 to e816533 Compare February 4, 2026 21:08
@snyk-pr-review-bot
Copy link
Copy Markdown

PR Reviewer Guide 🔍

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Restrictive Hardcoded Path

The implementation strictly looks for the release file at /opt/java/openjdk/release. This path is common for some distributions (e.g., Eclipse Adoptium), but many others place the release file in /usr/lib/jvm/... or /usr/java/.... Consider supporting a broader list of standard JVM paths or a fallback mechanism (similar to how os-release detection works) to ensure wider compatibility with different base images.

filePathMatches: (filePath) =>
  filePath === normalizePath("/opt/java/openjdk/release"),
Potential Memory Issue

The getJavaRuntimeReleaseAction uses streamToString as a callback, which buffers the entire file content into memory. If a malicious or corrupted image contains a very large file at the target path, this could lead to an Out-Of-Memory (OOM) condition. Consider implementing a size limit check (e.g., matching the pattern used in streamToJson or checking streamSize if reliable) to safely handle potential large files.

callback: streamToString,
📚 Repository Context Analyzed

This review considered 36 relevant code sections from 15 files (average relevance: 0.98)

@kateeselius kateeselius requested a review from bgardiner February 4, 2026 21:31
bgardiner
bgardiner previously approved these changes Feb 4, 2026
if (!content || content.trim().length === 0) {
return null;
}
if (content.length > MAX_CONTENT_LENGTH) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, streamStoString has a streamSize property, but it appears to be completely ignored:

streamSize?: number,
. Ideally that function would respect it, similar to the size limit in streamToJson:
export async function streamToJson<T>(stream: Readable): Promise<T> {
return new Promise<T>((resolve, reject) => {
const chunks: string[] = [];
let bytes = 0;
stream.on("end", () => {
try {
resolve(JSON.parse(chunks.join("")));
} catch (error) {
reject(error);
}
});
stream.on("error", (error) => reject(error));
stream.on("data", (chunk) => {
bytes += chunk.length;
if (bytes <= 2 * MEGABYTE) {
chunks.push(chunk.toString("utf8"));
} else {
reject(new Error("The stream is too large to parse as JSON"));
}
});
});
}
.

Ideally we would be more defensive here and not read a potentially unbounded file contents into memory, but it's not unprecedented in the repo for other static file analysis (like the os package databases), so we can just keep it for now. However, I think it might be good to create a ticket to track this improvement (not just for release file parsing but other instances of file parsing).

@kateeselius kateeselius force-pushed the CN-444-support-jvm-vulnerability-detection branch from e816533 to f3d9e76 Compare March 4, 2026 20:37
}

export interface BaseRuntime {
type: string;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only current value is "java" would narrowing this to a literal type (or a discriminated union) provides compile-time safety:
For example:

export interface BaseRuntime {
    type: "java";                                                                                                                                     
    version: string;
  }

export const getJavaRuntimeReleaseAction: ExtractAction = {
actionName: "java-runtime-release",
filePathMatches: (filePath) =>
filePath === normalizePath("/opt/java/openjdk/release"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude suggests that:

/opt/java/openjdk/release is the Eclipse Adoptium/Temurin convention. Many widely-used JVM base images use different paths:

  • eclipse-temurin → /opt/java/openjdk/release ✓
  • openjdk (official Docker) → /usr/local/openjdk-/release
  • Debian/Ubuntu default-jdk → /usr/lib/jvm/java--openjdk-/release
  • Oracle → /usr/java//release

Should we at least expand to a few common other release file locations:

  filePathMatches: (filePath) =>
    filePath === normalizePath("/opt/java/openjdk/release") ||
    filePath.startsWith(normalizePath("/usr/local/openjdk-")) ...

@kateeselius kateeselius merged commit 79a0114 into main Apr 7, 2026
13 checks passed
@kateeselius kateeselius deleted the CN-444-support-jvm-vulnerability-detection branch April 7, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants