Refactor UrlId into FilePath enum#1251
Conversation
0e6a188 to
eb2c844
Compare
4ac9528 to
e807fd2
Compare
There was a problem hiding this comment.
I really like the idea of FilePath, but we are currently in a state where A LOT of existing code is still using url, url_id, or something similar, and I think it results in a net negative in terms of code readability. I have marked a few cases, but have definitely not captured them all. I'd like to advocate for a close review of all uses of FilePath before merging to ensure that the corresponding binding is also called path or file_path.
|
|
||
| /// Build a [`FilePath::File`] from a filesystem path. Errors if | ||
| /// the path can't be expressed as a UTF-8 absolute path. | ||
| pub fn from_file_path(path: impl AsRef<Path>) -> anyhow::Result<Self> { |
There was a problem hiding this comment.
I think this should just be from_path() now, because I saw FilePath::from_file_path() elsewhere and was pretty confused by that naming
Or actually, probably from_path_buf() to mimic the AbsPathBuf change
|
|
||
| /// Build from a UTF-8 path. Returns `None` if the path is not | ||
| /// absolute. | ||
| pub fn from_utf8(path: Utf8PathBuf) -> Option<Self> { |
There was a problem hiding this comment.
Like Utf8PathBuf::from_path_buf and to_path_buf(), I'd prefer the unambiguous from_utf8_path_buf()
| pub fn from_path(path: &Path) -> Option<Self> { | ||
| let utf8 = Utf8PathBuf::from_path_buf(path.to_path_buf()) | ||
| .map_err(|p| anyhow::anyhow!("Path is not valid UTF-8: {}", p.display())) | ||
| .warn_on_err()?; | ||
| Self::from_utf8(utf8) |
There was a problem hiding this comment.
| pub fn from_path(path: &Path) -> Option<Self> { | |
| let utf8 = Utf8PathBuf::from_path_buf(path.to_path_buf()) | |
| .map_err(|p| anyhow::anyhow!("Path is not valid UTF-8: {}", p.display())) | |
| .warn_on_err()?; | |
| Self::from_utf8(utf8) | |
| pub fn from_path_buf(path: PathBuf) -> Option<Self> { | |
| let utf8 = Utf8PathBuf::from_path_buf(path) | |
| .map_err(|p| anyhow::anyhow!("Path is not valid UTF-8: {}", p.display())) | |
| .warn_on_err()?; | |
| Self::from_utf8(utf8) |
I think this will make from_url() a bit simpler too
Also, I generally treat from_ APIs as consuming the input
I also like the general idea of not hiding clones from the caller, i.e. this forces the to_path_buf() caller up onto the caller, making it obvious at the call site that the clone occurs. And if they happen to have a PathBuf already, they aren't forced into a second clone.
| fn canonical_path(url: &FilePath) -> Option<PathBuf> { | ||
| if !url.is_file() { | ||
| return None; | ||
| } | ||
| let path = url.to_file_path().ok()?; | ||
| let path = url.to_path_buf()?; | ||
| std::fs::canonicalize(path).ok() | ||
| } |
There was a problem hiding this comment.
| fn canonical_path(url: &FilePath) -> Option<PathBuf> { | |
| if !url.is_file() { | |
| return None; | |
| } | |
| let path = url.to_file_path().ok()?; | |
| let path = url.to_path_buf()?; | |
| std::fs::canonicalize(path).ok() | |
| } | |
| fn canonical_path(url: &FilePath) -> Option<PathBuf> { | |
| match url { | |
| FilePath::File(path) => std::fs::canonicalize(path.as_path()).ok(), | |
| FilePath::Virtual(_) => None, | |
| } | |
| } |
Hmm. I'd like to challenge the addition of the to_path_buf() method.
I understand the desire to have a way to go from FilePath -> PathBuf for compatibility, but I have a strong feeling that it would be better to stay in UTF8PathBuf form as long as possible.
Like, I think this rewrite of canonical_path() is more obvious and takes advantage of UTF8PathBuf features
Can we avoid adding to_path_buf() entirely in favor of rewrites likes this one? And should canonical_path() itself also return another UTF8PathBuf?
| #[cfg(test)] | ||
| mod tests; |
| pub struct FileEvent { | ||
| pub kind: FileEventKind, | ||
| pub url: UrlId, | ||
| pub url: FilePath, |
There was a problem hiding this comment.
Also a pass looking for url: FilePath to change to path: FilePath
| let Some(path) = url.to_path_buf() else { | ||
| log::warn!("Skipping add_watched_file: URL is not a file path"); | ||
| return; | ||
| }; |
There was a problem hiding this comment.
| let Some(path) = url.to_path_buf() else { | |
| log::warn!("Skipping add_watched_file: URL is not a file path"); | |
| return; | |
| }; | |
| let Some(path) = url.as_file() else { | |
| log::warn!("Skipping add_watched_file: URL is not a file path"); | |
| return; | |
| }; | |
| let Some(placement) = classify(db, path.as_path().as_std_path()) else { | |
| // Either the URL falls outside every workspace, or it lives | |
| // inside a package subdir we don't track (tests/, inst/, ...). | |
| return; | |
| }; |
i would have written as this to avoid the expensive to_path_buf(). no need to make a clone here.
more evidence to me that to_path_buf() is a possible anti pattern
There was a problem hiding this comment.
and really we'd probably make classify() take &Utf8PathBuf
| /// Workspace-vs-library is then `root.kind(db)`. | ||
| #[returns(ref)] | ||
| pub description_url: UrlId, | ||
| pub description_url: FilePath, |
There was a problem hiding this comment.
| pub description_url: FilePath, | |
| pub description_path: FilePath, |
| /// on the first hit, so callers depend only on the index maps | ||
| /// actually visited. | ||
| fn file_by_url(&self, url: &UrlId) -> Option<File>; | ||
| fn file_by_url(&self, url: &FilePath) -> Option<File>; |
There was a problem hiding this comment.
| fn file_by_url(&self, url: &FilePath) -> Option<File>; | |
| fn file_by_path(&self, url: &FilePath) -> Option<File>; |
and argument name update
There was a problem hiding this comment.
same with package_by_url
There was a problem hiding this comment.
and many other functions in this file
| pub fn url_id_from_code_location(loc: &CodeLocation) -> UrlId { | ||
| UrlId::from_url(loc.uri.clone()) | ||
| /// Extract a canonical [`FilePath`] from a [`CodeLocation`]. | ||
| pub fn url_id_from_code_location(loc: &CodeLocation) -> FilePath { |
There was a problem hiding this comment.
| pub fn url_id_from_code_location(loc: &CodeLocation) -> FilePath { | |
| pub fn file_path_from_code_location(loc: &CodeLocation) -> FilePath { |
Only canonicalise when comparing against paths from external sources (e.g. normalised by R). Canonicalisation may corrupt the identity of paths sent by frontends, and may prevent matching files at all when they are deleted (at which point a non-canonicalised path sent by the frontend can't be matched against canonicalised paths we've stored as keys)
eb2c844 to
85cdf4f
Compare
e807fd2 to
10deebe
Compare
85cdf4f to
9463329
Compare
Branched from #1250 (see that PR first for context)
Progress towards #1212
UrlIdtoFilePathUrlenum. Variants:File(lexically normalised) andVirtual(wrap anUrie.g. for untitled files). We could addVendoredlater for vendored base R sources, mirroring ty.This refactor better reflects that the main representation for our files are on-disk paths.