Don't canonicalise file keys#1250
Conversation
0e6a188 to
eb2c844
Compare
LOL OUCH |
DavisVaughan
left a comment
There was a problem hiding this comment.
I simultaneously:
- am sad we dont have a canonical uri everywhere, but get it, especially with the deleted file example
- am happy we dont have to do IO for uri normalization
| /// What we normalise: drive-letter casing on Windows; percent-encoding | ||
| /// of `:` (decoded via `Url -> PathBuf -> Url` round-trip). No I/O: | ||
| /// `std::fs::canonicalize()`, no symlink resolution. The same input URI |
There was a problem hiding this comment.
dont we also normalize ..?
There was a problem hiding this comment.
you mention it in the pr, but i dont see it anywhere here, so thats why im asking
We only do lexical normalisation: resolve . and ...
There was a problem hiding this comment.
would be nice to see a test in this file either way!
There was a problem hiding this comment.
I can't find it either! Must have made a rebase mistake along the way or other... I distinctly remember component parsing code.
Claude says "Url::parse already collapses dot segments for file: URLs, but Url::from_file_path doesn't". Maybe I missed that the agent decided to remove that code at some point because of the parse behaviour 😬
| /// only. Lookups try the primary first (cheap, no fs touch), then | ||
| /// canonicalise the incoming URI and consult the secondary. DAP wire | ||
| /// output (`debugInfo` source paths, breakpoint events) round-trips the | ||
| /// primary key unchanged, so the frontend always sees the URI it sent. |
There was a problem hiding this comment.
Yea IIUC this isn't actually what is happening
In handle_set_breakpoints() we do this
let uri = match UrlId::from_file_path(path)
and we store that uri in by_url here.
So I'm not entirely sure "the frontend sees the uri it sent". It might be lexically normalized?
There was a problem hiding this comment.
Which is either fine, and needs a doc comment update, or isn't fine, and we need a third map that gets us back to the original path: String the DAP provided us 😬
There was a problem hiding this comment.
And either way, a test could be good
There was a problem hiding this comment.
I think we should be consistent and send the verbatim path back here, thanks for catching this
There was a problem hiding this comment.
Instead of a third map, I've just included the verbatim path in the first map. I wonder if I should store the verbatim in the path variant of FilePath, in the next PR.
8f13f9c to
5f3ecf2
Compare
eb2c844 to
85cdf4f
Compare
ec1435d to
a2787c2
Compare
Only canonicalise when comparing against paths from external sources (e.g. normalised by R). Canonicalisation may corrupt the identity of paths sent by frontends, and may prevent matching files at all when they are deleted (at which point a non-canonicalised path sent by the frontend can't be matched against canonicalised paths we've stored as keys)
85cdf4f to
9463329
Compare
Branched from #1245
Progress towards #1212
While I was working on breakpoints I ran into difficulties matching files from different locations: frontend URIs via DAP requests and source references from R whose paths went through
normalizePath(). The latter does file canonicalisation, which means that in addition to resolving.and.., it also follows symlinks. The frontend URIs are not canonicalised, which caused CI failures when running tests in projects storedc in temporary folders, with mismatches between/tmpand/private/tmp. The fix I implemented was to canonicalise everywhere and encode that canonicalisation happened with theUrlIdfile, which we stored as file keys in our maps tracking state (e.g. breakpoints) by file.The
UrlIdnormalisation also dealt with other issues that surfaced over time, such as windows letter drive casing because to fix issues with lowercase on one end and uppercase on the other.The same sort of issue came up with the LSP: we get URIs from the frontend, but we also encounter
source()calls that may be relative and we need to resolve them. So I reached for the sameUrlIdapproach to make the different file paths agree with each other.It turns out that doing any sort of normalisation that requires I/O is a red flag. If a file gets deleted on disk, and we get an event about that file, there is literally no way to match it back to what we know. For the LSP that's a real issue because we get file deletion events after the file is deleted.
The other thing to watch out for: Always report back the unormalised URI/path to the frontend, because that's what they use to map files on their end. Any changes might cause the frontend to treat the file as different.
With this PR:
.and... That matches r-a and ty.