Skip to content

fix: auto-correct dash-to-underscore filename mismatches in path validation (#48)#160

Closed
williamp44 wants to merge 2 commits into
bhauman:mainfrom
williamp44:fix/read-file-dash-underscore
Closed

fix: auto-correct dash-to-underscore filename mismatches in path validation (#48)#160
williamp44 wants to merge 2 commits into
bhauman:mainfrom
williamp44:fix/read-file-dash-underscore

Conversation

@williamp44

@williamp44 williamp44 commented Mar 14, 2026

Copy link
Copy Markdown

Summary

  • Fixes Readfile should accept filenames with - and return file content along with corrected filename. #48 — LLMs commonly request Clojure files using namespace-style dashes (core-stuff.clj) when the filesystem uses underscores (core_stuff.clj)
  • Adds transparent dash-to-underscore correction in validate-path-with-client so all tools (read_file, edit, grep, glob, etc.) get the fix for free
  • Only corrects .clj, .cljs, .cljc files — extensions that follow the Clojure namespace-to-filename convention
  • Only corrects the filename portion, preserving dashes in directory components
  • Corrected path is re-validated for defense-in-depth (symlink protection)

Approach

Per your review feedback: correction lives in validate-path-with-client in valid_paths.clj, not in the read_file tool. After validate-path returns, if the file doesn't exist and has a Clojure source extension, tries the underscore version. The tool already returns the path it read, so the LLM sees the corrected filename without needing an explicit notice.

Changes

  • utils/valid_paths.cljclojure-source-extensions constant (shared with clojure-file?), clojure-source-ext? predicate, try-dash-to-underscore-correction helper, updated validate-path-with-client
  • utils/valid_paths_test.clj — 12 unit test scenarios
  • tools/unified_read_file/tool_test.clj — 7 integration tests through full tool pipeline + recursive fixture cleanup

Test plan

  • 314 tests, 2273 assertions, 0 failures (rebased on v0.3.0)
  • .clj, .cljs, .cljc correction works
  • .bb, .edn, .lpy, .java, .py files are NOT corrected
  • Directory dashes preserved (only filename corrected)
  • Reverse direction (underscore→dash) does NOT correct
  • Mixed dashes/underscores corrected
  • Case-insensitive extension matching
  • Non-existent files return original path (caller handles error)
  • Full pipeline integration test via make-tool-tester

Summary by CodeRabbit

  • New Features

    • Automatic file lookup now tries dash-to-underscore filename conversion for Clojure sources when the requested file isn't found — applies to .clj/.cljs/.cljc, preserves directory dashes, and does not alter non-Clojure or Babashka files.
  • Tests

    • Added comprehensive integration and unit tests covering conversion behavior, edge cases, error handling, and cleanup of test artifacts.

@coderabbitai

coderabbitai Bot commented Mar 14, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Attempts filename-only dash→underscore correction for missing Clojure source files: validate-path-with-client checks existence, and if absent and the extension is a Clojure source, it tries replacing dashes with underscores in the filename (not directories), re-validates, and returns the corrected path when found. Tests and integration coverage added.

Changes

Cohort / File(s) Summary
Path validation logic
src/clojure_mcp/utils/valid_paths.clj
Add private set clojure-source-extensions and helper clojure-source-ext?; implement try-dash-to-underscore-correction and update validate-path-with-client to attempt filename-only dash→underscore correction and re-validate when initial lookup fails.
Unit tests: valid_paths
test/clojure_mcp/utils/valid_paths_test.clj
Add dash-to-underscore-correction-test covering underscore fallback for .clj/.cljs/.cljc (and case variants), non-Clojure exclusions (.java/.py/.bb), directory-component dash preservation, mixed dash/underscore names, reverse-direction checks, and cleanup.
Integration tests: read-file tool
test/clojure_mcp/tools/unified_read_file/tool_test.clj
Require clojure.string, switch to recursive cleanup, and append integration tests exercising dash→underscore correction through the read-file tool pipeline (content return, correction notices, error cases, .bb exclusion).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Validator as validate-path-with-client
    participant FS as FileSystem
    participant Helper as CorrectionHelper

    Client->>Validator: validate-path-with-client(path, nrepl-client)
    Validator->>FS: stat(path)
    FS-->>Validator: not found
    Validator->>Helper: try-dash-to-underscore-correction(path)
    Helper->>Helper: check extension ∈ clojure-source-extensions
    alt Clojure source
        Helper->>Helper: replace '-' → '_' in filename only
        Helper->>FS: stat(corrected-path)
        FS-->>Helper: found / not found
        Helper-->>Validator: corrected-path or nil
    else Non-Clojure or no correction
        Helper-->>Validator: nil
    end
    alt corrected-path
        Validator->>FS: stat(corrected-path)
        FS-->>Validator: exists
        Validator-->>Client: return corrected-path
    else fallback
        Validator-->>Client: original validation result (error)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I found a dash that lost its way,
I nudged it gently into underscore play,
Tests hopped along,
Files sang a new song—
A tidy path and a carrot to stay. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: auto-correcting dash-to-underscore filename mismatches in path validation, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #48: auto-corrects dash-to-underscore mismatches for Clojure files, applies correction only to .clj/.cljs/.cljc extensions, preserves directory dashes, returns corrected path, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing dash-to-underscore correction in path validation: modifications to valid_paths.clj, comprehensive unit tests in valid_paths_test.clj, and integration tests in tool_test.clj.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bhauman

bhauman commented Mar 14, 2026

Copy link
Copy Markdown
Owner

Thanks for this! The dash-to-underscore correction is a real pain point with LLMs.

I think we can simplify this significantly by moving the correction into validate-path-with-client in valid_paths.clj instead of handling it in the read_file tool. Something like: after validate-path returns, if the file doesn't exist and it's a .clj/.cljs/.cljc file, try the dash-to-underscore version of the filename.

That way every tool that validates paths (read_file, the edit tools, etc.) gets the correction for free, and we avoid the re-indentation of execute-tool and threading correction-notice through the pipeline. The tool already returns the path it read, so the LLM will see the corrected filename without needing an explicit notice.

Would you be open to reworking it with that approach?

@williamp44

Copy link
Copy Markdown
Author

yes, of course. will review and work through it.

@williamp44 williamp44 force-pushed the fix/read-file-dash-underscore branch from fef9282 to a69383b Compare March 14, 2026 12:15

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/clojure_mcp/utils/valid_paths.clj (1)

61-69: Deduplicate source-extension matching to prevent drift.

clojure-source-ext? introduces a second extension list that overlaps with clojure-file?. Consider centralizing the .clj/.cljs/.cljc list in one private constant/predicate and reusing it in both places.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/clojure_mcp/utils/valid_paths.clj` around lines 61 - 69, Introduce a
single private constant or predicate (e.g., def ^:private clojure-extensions or
a private function clojure-ext? ) that lists/checks the Clojure source
extensions (.clj, .cljs, .cljc) and replace the duplicate logic in
clojure-source-ext? and clojure-file? to call that shared symbol; update
clojure-source-ext? to delegate to the new predicate/constant (preserving the
nil check and lower-case handling) so extension matching is centralized and
cannot drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/clojure_mcp/utils/valid_paths.clj`:
- Around line 61-69: Introduce a single private constant or predicate (e.g., def
^:private clojure-extensions or a private function clojure-ext? ) that
lists/checks the Clojure source extensions (.clj, .cljs, .cljc) and replace the
duplicate logic in clojure-source-ext? and clojure-file? to call that shared
symbol; update clojure-source-ext? to delegate to the new predicate/constant
(preserving the nil check and lower-case handling) so extension matching is
centralized and cannot drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df08ff63-bedd-4f14-8289-4818f791e369

📥 Commits

Reviewing files that changed from the base of the PR and between fef9282 and a69383b.

📒 Files selected for processing (3)
  • src/clojure_mcp/utils/valid_paths.clj
  • test/clojure_mcp/tools/unified_read_file/tool_test.clj
  • test/clojure_mcp/utils/valid_paths_test.clj

@williamp44 williamp44 changed the title fix: auto-correct dash-to-underscore filename mismatches in read_file fix: auto-correct dash-to-underscore filename mismatches in path validation (#48) Mar 14, 2026
…dation (bhauman#48)

LLMs frequently request Clojure files using dashes (core-stuff.clj) when the
filesystem uses underscores (core_stuff.clj). This adds transparent correction
in validate-path-with-client so all tools (read_file, edit, grep, etc.) get
the fix for free.

Only corrects .clj/.cljs/.cljc files. Directory components are preserved.
Corrected path is re-validated for defense-in-depth (symlink protection).
@williamp44 williamp44 force-pushed the fix/read-file-dash-underscore branch from a69383b to 63ea7ac Compare March 14, 2026 12:24
@williamp44

williamp44 commented Mar 14, 2026

Copy link
Copy Markdown
Author

Bruce, not sure what i did wrong, but there was a merge conflict in the github UI. i merged the change i expected, but not sure how this happened.

UPDATES:
i see test errors- is ClojureDart a requirement?
FAIL in (clojure-file?-test) (valid_paths_test.clj:108)
Detect ClojureDart extension
expected: (valid-paths/clojure-file? "test.cljd")
actual: (not (valid-paths/clojure-file? "test.cljd"))

FAIL in (clojure-file?-test) (valid_paths_test.clj:109)
Detect ClojureDart extension
expected: (valid-paths/clojure-file? "/path/to/file.cljd")
actual: (not (valid-paths/clojure-file? "/path/to/file.cljd"))

please discard this PR and i will review locally to see what went wrong.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/clojure_mcp/utils/valid_paths.clj (1)

152-156: ⚠️ Potential issue | 🔴 Critical

Restore .cljd detection in clojure-file? to fix regression.

The refactor dropped .cljd recognition, which is now failing CI (Detect ClojureDart extension assertions). Keep correction scoped to .clj/.cljs/.cljc, but preserve .cljd classification in clojure-file?.

💡 Suggested fix
 (defn clojure-file?
@@
   (when file-path
     (let [lower-path (str/lower-case file-path)]
       (or (clojure-source-ext? file-path)
+          (str/ends-with? lower-path ".cljd")
           (str/ends-with? lower-path ".bb")
           (str/ends-with? lower-path ".lpy")
           (str/ends-with? lower-path ".edn")
           (babashka-shebang? file-path)))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/clojure_mcp/utils/valid_paths.clj` around lines 152 - 156, The
clojure-file? predicate lost recognition for the .cljd extension; update the OR
branch in clojure-file? to include (str/ends-with? lower-path ".cljd") alongside
the existing checks (e.g., .clj, .cljs, .cljc, .bb, .lpy, .edn) so files with
.cljd are classified as Clojure; locate the clojure-file? function (uses
file-path and lower-path and babashka-shebang?) and add the .cljd ends-with
check in the same style as the other extension checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/clojure_mcp/utils/valid_paths.clj`:
- Around line 152-156: The clojure-file? predicate lost recognition for the
.cljd extension; update the OR branch in clojure-file? to include
(str/ends-with? lower-path ".cljd") alongside the existing checks (e.g., .clj,
.cljs, .cljc, .bb, .lpy, .edn) so files with .cljd are classified as Clojure;
locate the clojure-file? function (uses file-path and lower-path and
babashka-shebang?) and add the .cljd ends-with check in the same style as the
other extension checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5a1cb31-f6c6-4fc7-bc33-42916deb5373

📥 Commits

Reviewing files that changed from the base of the PR and between 63ea7ac and 63015aa.

📒 Files selected for processing (2)
  • src/clojure_mcp/utils/valid_paths.clj
  • test/clojure_mcp/utils/valid_paths_test.clj

@bhauman

bhauman commented Mar 14, 2026

Copy link
Copy Markdown
Owner

Thanks for working on this and for being proactive about the test failures — appreciate the quick communication. Go ahead and sort it out locally and open a new PR when it's ready. Looking forward to seeing the reworked approach!

@bhauman bhauman closed this Mar 14, 2026
@bhauman

bhauman commented Mar 14, 2026

Copy link
Copy Markdown
Owner

Also, I'm sorry — I realize you did a bunch of work on this today and it sounds like it was your only day to get things done. I really do appreciate the effort, and the approach is solid. Looking forward to the follow-up PR.

@williamp44

Copy link
Copy Markdown
Author

Bruce, I found the issue was a collision on the ClojureDart .cljd changes that my PR did not account for. Re-doing the PR against the main branch where support for .cljd files was added.

please note i did not find any official site or documentation on how to handle dash/underscore for cljd files. here is what i found on google. please confirm this PR should apply to cljd file types

image image

@bhauman

bhauman commented Mar 16, 2026

Copy link
Copy Markdown
Owner

Yes, the dash→underscore correction should apply to .cljd files. Confirmed by looking at the ClojureDart compiler source — ns-to-paths in compiler.cljc does the exact same mapping:

(defn ns-to-paths [ns-name]
  (let [base (replace-all (name ns-name) #"[.-]" {"." "/" "-" "_"})]
    [(str base ".cljd") (str base ".cljc")]))

So my-app.cool-module maps to src/my_app/cool_module.cljd — same convention as Clojure/JVM. Include .cljd in the correction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Readfile should accept filenames with - and return file content along with corrected filename.

2 participants