fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox#1935
fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox#1935JasonOA888 wants to merge 4 commits intobytedance:mainfrom
Conversation
…Sandbox In LocalSandbox mode, read_file and write_file now transform container paths in file content, matching the path handling behavior of bash tool. - write_file: resolves virtual paths in content to system paths before writing, so scripts with /mnt/user-data paths work when executed - read_file: reverse-resolves system paths back to virtual paths in returned content for consistency This fixes scenarios where agents write Python scripts with virtual paths, then execute them via bash tool expecting the paths to work. Fixes bytedance#1778
There was a problem hiding this comment.
Pull request overview
This PR fixes inconsistent path handling in LocalSandbox by making read_file/write_file transform virtual container paths inside file contents, aligning their behavior with execute_command’s path resolution so scripts written with /mnt/user-data/... paths can run correctly on the host.
Changes:
write_file: resolves container-path strings insidecontentto host-local paths before writing.read_file: reverse-resolves host-local paths back to container paths in returned file content.
| # Resolve container paths in content to local paths | ||
| # so that written files use correct system paths when executed | ||
| resolved_content = self._resolve_paths_in_command(content) | ||
| mode = "a" if append else "w" | ||
| with open(resolved_path, mode, encoding="utf-8") as f: | ||
| f.write(content) | ||
| f.write(resolved_content) |
There was a problem hiding this comment.
On Windows hosts, self._resolve_path() will typically return paths with backslashes (because LocalSandboxProvider stores local_path as Path(...).resolve()), so resolved_content can inject sequences like C:\Users\... into source files. In common cases (e.g. Python scripts), those backslashes inside string literals can create invalid escape sequences (e.g. \U...) and break execution. Consider normalizing resolved paths inserted into file content to a safer representation (e.g. forward slashes) and ensure read_file’s reverse-resolver also recognizes that normalized form so agents don’t see leaked host paths.
| # Resolve container paths in content to local paths | ||
| # so that written files use correct system paths when executed | ||
| resolved_content = self._resolve_paths_in_command(content) | ||
| mode = "a" if append else "w" |
There was a problem hiding this comment.
_resolve_paths_in_command() is documented and tuned for shell command parsing (boundary chars, quoting, etc.), but it’s now being reused to transform arbitrary file contents. That coupling is easy to miss and makes future tweaks to command parsing potentially break file IO semantics. Consider extracting a dedicated helper for “resolve paths in text content” (or renaming/generalizing the helper) so the intent and constraints are explicit.
| @@ -280,7 +280,10 @@ def read_file(self, path: str) -> str: | |||
| resolved_path = self._resolve_path(path) | |||
| try: | |||
| with open(resolved_path, encoding="utf-8") as f: | |||
| return f.read() | |||
| content = f.read() | |||
| # Reverse resolve local paths back to container paths in content | |||
| # for consistency with bash tool output path handling | |||
| return self._reverse_resolve_paths_in_output(content) | |||
There was a problem hiding this comment.
This change alters core LocalSandbox behavior by rewriting paths inside written file content and rewriting them back when reading. There are existing unit tests for path mapping / command replacement in test_local_sandbox_provider_mounts.py, but nothing asserting the new read/write content transformations. Adding focused tests for write_file (content path resolution) and read_file (reverse resolution) would help prevent regressions, especially across different mount mappings.
|
@JasonOA888 thanks for the contribution. Please take some time to address the Copilot review comments. |
…orward-slash safety + tests - Extract _resolve_paths_in_content() separate from _resolve_paths_in_command() to decouple file-content path resolution from shell-command parsing - Normalize resolved paths to forward slashes to avoid Windows backslash escape issues in source files (e.g. \U in Python string literals) - Add 4 focused tests: write resolves content, forward-slash guarantee, read reverse-resolves content, and write→read roundtrip
|
@WillemJiang Thanks for the nudge! I have addressed all three Copilot review points in commit 19c3f0d:
Please take another look when you get a chance. |
|
@JasonOA888, please run the following command to fix the lint error. |
|
@JasonOA888 read_file now unconditionally applies _reverse_resolve_paths_in_output to everything it reads. This means:
Example: A user uploads a config containing /Users/jiangning/.../user-data/outputs — if that local path maps to /mnt/user-data, read_file would corrupt it to /mnt/user-data/outputs. This is different from bash output, which is ephemeral. File content is persistent and read for many reasons (searching, parsing, forwarding to users). Suggestion: Consider only reverse-resolving in read_file if the file was previously written through write_file (e.g., track a set of "agent-written" paths). Or accept the risk and document it clearly. |
read_file previously applied _reverse_resolve_paths_in_output to ALL file content, which could silently rewrite paths in user uploads and external tool output (Willem Jiang review on bytedance#1935). Now tracks files written through write_file in _agent_written_paths. Only those files get reverse-resolved on read. Non-agent files are returned as-is.
|
@WillemJiang Great catch. You're right — unconditionally reverse-resolving in Pushed a fix (beb4e20) that addresses this:
Added a test that verifies: writing a file directly to the filesystem (simulating a user upload) with a local path → This preserves the roundtrip behavior for agent-authored content while protecting everything else from silent path corruption. |
Fixes #1778
Problem
In LocalSandbox mode,
read_fileandwrite_filedo not transform container paths in file content. This causes issues when:/mnt/user-data/...pathsThe bash tool already resolves paths in commands and reverse-resolves in output, but file tools do not.
Solution
write_file: resolve virtual paths in content to system paths before writingread_file: reverse-resolve system paths back to virtual paths in returned contentThis makes path handling consistent across all LocalSandbox tools.
Example
Before:
After:
Notes
execute_commandusesread_filereverses the transformation so agents see consistent virtual paths