Skip to content

fix: hash skill binaries#255

Open
mmilanta wants to merge 1 commit intomainfrom
fix/hash-binaries-in-skills
Open

fix: hash skill binaries#255
mmilanta wants to merge 1 commit intomainfrom
fix/hash-binaries-in-skills

Conversation

@mmilanta
Copy link
Copy Markdown
Contributor

@mmilanta mmilanta commented Apr 2, 2026

No description provided.

@mmilanta mmilanta requested a review from a team as a code owner April 2, 2026 10:26
@qodo-merge-etso
Copy link
Copy Markdown

Review Summary by Qodo

Hash binary skill files with SHA256

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add SHA256 hashing for binary skill files
• Replace generic message with hash-based identification
• Enable binary file content verification and comparison
Diagram
flowchart LR
  A["Binary File Read Error"] -->|"Generate SHA256 hash"| B["Hash-based Content"]
  B -->|"Replace generic message"| C["Binary file. Hash: {hexdigest}"]
Loading

Grey Divider

File Changes

1. src/agent_scan/skill_client.py ✨ Enhancement +4/-1

Add SHA256 hashing for binary skill files

• Import hashlib module for SHA256 hashing
• Catch UnicodeDecodeError and compute SHA256 hash of binary files
• Replace generic "Binary file. No content available." message with hash-based identifier
• Enable binary file content verification through hash comparison

src/agent_scan/skill_client.py


Grey Divider

Qodo Logo

@qodo-merge-etso
Copy link
Copy Markdown

qodo-merge-etso bot commented Apr 2, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. traverse_skill_tree() too long 📘 Rule violation ⚙ Maintainability
Description
traverse_skill_tree() is ~60 lines long, exceeding the ≤40 SLOC limit, making it harder to
maintain and safely extend. This PR modifies the function further without extracting helpers.
Code

src/agent_scan/skill_client.py[R123-131]

                    content = f.read()
            except UnicodeDecodeError:
                logger.exception(f"Error reading file: {file}. The file is not a bianry")
-                content = "Binary file. No content available."
+                with open(os.path.expanduser(full_path), "rb") as f:
+                    content_hash = hashlib.sha256(f.read()).hexdigest()
+                content = f"Binary file. Hash: {content_hash}"
            resources.append(
                Resource(
                    name=file,
Evidence
Compliance requires functions to be ≤40 lines; traverse_skill_tree() spans roughly lines 78-137
(~60 lines) and was modified in this PR in the binary-file handling block.

Rule 45: Limit function length to ≤ 40 lines (SLOC)
src/agent_scan/skill_client.py[78-137]
src/agent_scan/skill_client.py[121-135]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`traverse_skill_tree()` exceeds the 40-line function length limit and was further expanded/modified in this PR.

## Issue Context
Large functions are harder to test and maintain; this function currently handles traversal, file-type branching, text reading, binary handling, and object construction.

## Fix Focus Areas
- src/agent_scan/skill_client.py[78-137]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. No regression test for hash 📘 Rule violation ☼ Reliability
Description
The PR changes binary resource descriptions to include a SHA-256 hash, but no test asserts this new
behavior. Without an assertion, the change can silently regress.
Code

src/agent_scan/skill_client.py[R124-128]

            except UnicodeDecodeError:
                logger.exception(f"Error reading file: {file}. The file is not a bianry")
-                content = "Binary file. No content available."
+                with open(os.path.expanduser(full_path), "rb") as f:
+                    content_hash = hashlib.sha256(f.read()).hexdigest()
+                content = f"Binary file. Hash: {content_hash}"
Evidence
Compliance requires tests for changed code paths; the PR changes the binary-file path to compute and
embed a hash, while the existing test only calls inspect_skill(...) without assertions about the
binary description output.

Rule 4: Every change must include automated tests; bug fixes add a regression test
src/agent_scan/skill_client.py[124-128]
tests/v4compatibility/test_inspect.py[119-121]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Add a regression/unit test that asserts binary files are represented with a deterministic SHA-256 hash in `Resource.description`.

## Issue Context
The code now sets `content = f"Binary file. Hash: {content_hash}"` for binary files; existing tests exercise skill inspection but do not assert this new output.

## Fix Focus Areas
- src/agent_scan/skill_client.py[121-128]
- tests/v4compatibility/test_inspect.py[119-121]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Binary hash errors unhandled 🐞 Bug ☼ Reliability
Description
In traverse_skill_tree(), the fallback path for binary files opens and reads the file in binary mode
without any error handling, so PermissionError/ENOENT/etc. will bubble up and fail the entire skill
scan. Previously the UnicodeDecodeError fallback always produced a placeholder string and continued.
Code

src/agent_scan/skill_client.py[R126-128]

+                with open(os.path.expanduser(full_path), "rb") as f:
+                    content_hash = hashlib.sha256(f.read()).hexdigest()
+                content = f"Binary file. Hash: {content_hash}"
Evidence
The try/except only guards the UTF-8 open/read; the subsequent binary open+hash is outside any try,
so any I/O failure during hashing will abort traversal. This traversal runs as part of
inspect_skill() when scanning a skill.

src/agent_scan/skill_client.py[120-135]
src/agent_scan/skill_client.py[30-74]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`traverse_skill_tree()` catches `UnicodeDecodeError` from the UTF-8 read, but then performs a second `open(..., "rb")` + read/hash without a protective `try/except`. Any `OSError` (e.g., `PermissionError`, `FileNotFoundError` due to concurrent changes) will propagate and fail the scan.

### Issue Context
This code runs during skill scanning (`inspect_skill()` calls `traverse_skill_tree()`), so a single unreadable binary should not abort the whole scan.

### Fix Focus Areas
- src/agent_scan/skill_client.py[120-135]

### Expected fix
- Wrap the binary open/hash in a `try/except (OSError, Exception)`.
- On failure, log (prefer `logger.warning`/`logger.exception` with clear message) and set `content` to a safe placeholder like `"Binary file. Hash unavailable."` so traversal continues.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Unbounded binary file reads 🐞 Bug ➹ Performance
Description
Binary hashing uses hashlib.sha256(f.read()), which reads the entire file into memory at once and
can significantly slow scans or cause high memory usage when skills include large binaries. This
happens for every non-.md and non-script asset in the skill directory.
Code

src/agent_scan/skill_client.py[R126-127]

+                with open(os.path.expanduser(full_path), "rb") as f:
+                    content_hash = hashlib.sha256(f.read()).hexdigest()
Evidence
The traversal iterates all files and, for non-text assets, computes a SHA-256 by calling f.read()
with no size bound, which reads the entire binary into RAM in one shot.

src/agent_scan/skill_client.py[85-135]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The binary hashing implementation does `hashlib.sha256(f.read()).hexdigest()`, which loads the whole file into memory. Large files can cause memory spikes and slow scans.

### Issue Context
This code is in `traverse_skill_tree()` and may process many assets during a scan.

### Fix Focus Areas
- src/agent_scan/skill_client.py[124-128]

### Expected fix
- Replace `hashlib.sha256(f.read())` with a streaming hash update loop, e.g.:
 - `h = hashlib.sha256()`
 - `for chunk in iter(lambda: f.read(1024 * 1024), b""):`
 - `h.update(chunk)`
 - `content_hash = h.hexdigest()`
- (Optional) Consider an upper bound / early-exit policy if hashing very large binaries is not required for the scan.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines 123 to 131
content = f.read()
except UnicodeDecodeError:
logger.exception(f"Error reading file: {file}. The file is not a bianry")
content = "Binary file. No content available."
with open(os.path.expanduser(full_path), "rb") as f:
content_hash = hashlib.sha256(f.read()).hexdigest()
content = f"Binary file. Hash: {content_hash}"
resources.append(
Resource(
name=file,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. traverse_skill_tree() too long 📘 Rule violation ⚙ Maintainability

traverse_skill_tree() is ~60 lines long, exceeding the ≤40 SLOC limit, making it harder to
maintain and safely extend. This PR modifies the function further without extracting helpers.
Agent Prompt
## Issue description
`traverse_skill_tree()` exceeds the 40-line function length limit and was further expanded/modified in this PR.

## Issue Context
Large functions are harder to test and maintain; this function currently handles traversal, file-type branching, text reading, binary handling, and object construction.

## Fix Focus Areas
- src/agent_scan/skill_client.py[78-137]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines 124 to +128
except UnicodeDecodeError:
logger.exception(f"Error reading file: {file}. The file is not a bianry")
content = "Binary file. No content available."
with open(os.path.expanduser(full_path), "rb") as f:
content_hash = hashlib.sha256(f.read()).hexdigest()
content = f"Binary file. Hash: {content_hash}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. No regression test for hash 📘 Rule violation ☼ Reliability

The PR changes binary resource descriptions to include a SHA-256 hash, but no test asserts this new
behavior. Without an assertion, the change can silently regress.
Agent Prompt
## Issue description
Add a regression/unit test that asserts binary files are represented with a deterministic SHA-256 hash in `Resource.description`.

## Issue Context
The code now sets `content = f"Binary file. Hash: {content_hash}"` for binary files; existing tests exercise skill inspection but do not assert this new output.

## Fix Focus Areas
- src/agent_scan/skill_client.py[121-128]
- tests/v4compatibility/test_inspect.py[119-121]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +126 to +128
with open(os.path.expanduser(full_path), "rb") as f:
content_hash = hashlib.sha256(f.read()).hexdigest()
content = f"Binary file. Hash: {content_hash}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Binary hash errors unhandled 🐞 Bug ☼ Reliability

In traverse_skill_tree(), the fallback path for binary files opens and reads the file in binary mode
without any error handling, so PermissionError/ENOENT/etc. will bubble up and fail the entire skill
scan. Previously the UnicodeDecodeError fallback always produced a placeholder string and continued.
Agent Prompt
### Issue description
`traverse_skill_tree()` catches `UnicodeDecodeError` from the UTF-8 read, but then performs a second `open(..., "rb")` + read/hash without a protective `try/except`. Any `OSError` (e.g., `PermissionError`, `FileNotFoundError` due to concurrent changes) will propagate and fail the scan.

### Issue Context
This code runs during skill scanning (`inspect_skill()` calls `traverse_skill_tree()`), so a single unreadable binary should not abort the whole scan.

### Fix Focus Areas
- src/agent_scan/skill_client.py[120-135]

### Expected fix
- Wrap the binary open/hash in a `try/except (OSError, Exception)`.
- On failure, log (prefer `logger.warning`/`logger.exception` with clear message) and set `content` to a safe placeholder like `"Binary file. Hash unavailable."` so traversal continues.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant