Skip to content

Refactor UrlId into FilePath enum#1251

Open
lionel- wants to merge 10 commits into
oak/salsa-14-canonicalisationfrom
oak/salsa-15-file-path-refactor
Open

Refactor UrlId into FilePath enum#1251
lionel- wants to merge 10 commits into
oak/salsa-14-canonicalisationfrom
oak/salsa-15-file-path-refactor

Conversation

@lionel-

@lionel- lionel- commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Branched from #1250 (see that PR first for context)
Progress towards #1212

  • Rename UrlId to FilePath
  • Make it our own custom enum instead of piggybacking on the Url enum. Variants: File (lexically normalised) and Virtual (wrap an Uri e.g. for untitled files). We could add Vendored later for vendored base R sources, mirroring ty.

This refactor better reflects that the main representation for our files are on-disk paths.

@lionel- lionel- force-pushed the oak/salsa-14-canonicalisation branch from 0e6a188 to eb2c844 Compare June 4, 2026 14:19
@lionel- lionel- force-pushed the oak/salsa-15-file-path-refactor branch from 4ac9528 to e807fd2 Compare June 4, 2026 14:19

@DavisVaughan DavisVaughan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of FilePath, but we are currently in a state where A LOT of existing code is still using url, url_id, or something similar, and I think it results in a net negative in terms of code readability. I have marked a few cases, but have definitely not captured them all. I'd like to advocate for a close review of all uses of FilePath before merging to ensure that the corresponding binding is also called path or file_path.


/// Build a [`FilePath::File`] from a filesystem path. Errors if
/// the path can't be expressed as a UTF-8 absolute path.
pub fn from_file_path(path: impl AsRef<Path>) -> anyhow::Result<Self> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be from_path() now, because I saw FilePath::from_file_path() elsewhere and was pretty confused by that naming

Or actually, probably from_path_buf() to mimic the AbsPathBuf change


/// Build from a UTF-8 path. Returns `None` if the path is not
/// absolute.
pub fn from_utf8(path: Utf8PathBuf) -> Option<Self> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Utf8PathBuf::from_path_buf and to_path_buf(), I'd prefer the unambiguous from_utf8_path_buf()

Comment on lines +154 to +158
pub fn from_path(path: &Path) -> Option<Self> {
let utf8 = Utf8PathBuf::from_path_buf(path.to_path_buf())
.map_err(|p| anyhow::anyhow!("Path is not valid UTF-8: {}", p.display()))
.warn_on_err()?;
Self::from_utf8(utf8)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn from_path(path: &Path) -> Option<Self> {
let utf8 = Utf8PathBuf::from_path_buf(path.to_path_buf())
.map_err(|p| anyhow::anyhow!("Path is not valid UTF-8: {}", p.display()))
.warn_on_err()?;
Self::from_utf8(utf8)
pub fn from_path_buf(path: PathBuf) -> Option<Self> {
let utf8 = Utf8PathBuf::from_path_buf(path)
.map_err(|p| anyhow::anyhow!("Path is not valid UTF-8: {}", p.display()))
.warn_on_err()?;
Self::from_utf8(utf8)

I think this will make from_url() a bit simpler too

Also, I generally treat from_ APIs as consuming the input

I also like the general idea of not hiding clones from the caller, i.e. this forces the to_path_buf() caller up onto the caller, making it obvious at the call site that the clone occurs. And if they happen to have a PathBuf already, they aren't forced into a second clone.

Comment on lines +302 to 308
fn canonical_path(url: &FilePath) -> Option<PathBuf> {
if !url.is_file() {
return None;
}
let path = url.to_file_path().ok()?;
let path = url.to_path_buf()?;
std::fs::canonicalize(path).ok()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn canonical_path(url: &FilePath) -> Option<PathBuf> {
if !url.is_file() {
return None;
}
let path = url.to_file_path().ok()?;
let path = url.to_path_buf()?;
std::fs::canonicalize(path).ok()
}
fn canonical_path(url: &FilePath) -> Option<PathBuf> {
match url {
FilePath::File(path) => std::fs::canonicalize(path.as_path()).ok(),
FilePath::Virtual(_) => None,
}
}

Hmm. I'd like to challenge the addition of the to_path_buf() method.

I understand the desire to have a way to go from FilePath -> PathBuf for compatibility, but I have a strong feeling that it would be better to stay in UTF8PathBuf form as long as possible.

Like, I think this rewrite of canonical_path() is more obvious and takes advantage of UTF8PathBuf features

Can we avoid adding to_path_buf() entirely in favor of rewrites likes this one? And should canonical_path() itself also return another UTF8PathBuf?

Comment on lines +265 to +266
#[cfg(test)]
mod tests;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this in lib.rs?

pub struct FileEvent {
pub kind: FileEventKind,
pub url: UrlId,
pub url: FilePath,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a pass looking for url: FilePath to change to path: FilePath

Comment on lines +59 to 62
let Some(path) = url.to_path_buf() else {
log::warn!("Skipping add_watched_file: URL is not a file path");
return;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let Some(path) = url.to_path_buf() else {
log::warn!("Skipping add_watched_file: URL is not a file path");
return;
};
let Some(path) = url.as_file() else {
log::warn!("Skipping add_watched_file: URL is not a file path");
return;
};
let Some(placement) = classify(db, path.as_path().as_std_path()) else {
// Either the URL falls outside every workspace, or it lives
// inside a package subdir we don't track (tests/, inst/, ...).
return;
};

i would have written as this to avoid the expensive to_path_buf(). no need to make a clone here.

more evidence to me that to_path_buf() is a possible anti pattern

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and really we'd probably make classify() take &Utf8PathBuf

/// Workspace-vs-library is then `root.kind(db)`.
#[returns(ref)]
pub description_url: UrlId,
pub description_url: FilePath,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub description_url: FilePath,
pub description_path: FilePath,

Comment thread crates/oak_db/src/db.rs
/// on the first hit, so callers depend only on the index maps
/// actually visited.
fn file_by_url(&self, url: &UrlId) -> Option<File>;
fn file_by_url(&self, url: &FilePath) -> Option<File>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn file_by_url(&self, url: &FilePath) -> Option<File>;
fn file_by_path(&self, url: &FilePath) -> Option<File>;

and argument name update

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with package_by_url

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and many other functions in this file

Comment thread crates/ark/src/url.rs
pub fn url_id_from_code_location(loc: &CodeLocation) -> UrlId {
UrlId::from_url(loc.uri.clone())
/// Extract a canonical [`FilePath`] from a [`CodeLocation`].
pub fn url_id_from_code_location(loc: &CodeLocation) -> FilePath {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn url_id_from_code_location(loc: &CodeLocation) -> FilePath {
pub fn file_path_from_code_location(loc: &CodeLocation) -> FilePath {

lionel- added 10 commits June 10, 2026 16:09
Only canonicalise when comparing against paths from external sources
(e.g. normalised by R). Canonicalisation may corrupt the identity of
paths sent by frontends, and may prevent matching files at all when they
are deleted (at which point a non-canonicalised path sent by the
frontend can't be matched against canonicalised paths we've stored as
keys)
@lionel- lionel- force-pushed the oak/salsa-14-canonicalisation branch from eb2c844 to 85cdf4f Compare June 10, 2026 14:31
@lionel- lionel- force-pushed the oak/salsa-15-file-path-refactor branch from e807fd2 to 10deebe Compare June 10, 2026 14:32
@lionel- lionel- force-pushed the oak/salsa-14-canonicalisation branch from 85cdf4f to 9463329 Compare June 11, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants