Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ just test <test_name>
just test -p ark
```

### Placing Integration Tests

Put integration tests under `crates/<crate>/tests/integration/`, with a `main.rs` that declares each test file as a module (`mod library;`). Don't add loose `*.rs` files directly under `tests/`. Each top-level file there compiles as its own test binary, so consolidating them into one `integration` binary keeps build and link times down and saves on disk space.

### Running Clippy

```sh
Expand Down
17 changes: 17 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ oak_ide = { path = "crates/oak_ide" }
oak_index_vec = { path = "crates/oak_index_vec" }
oak_package_metadata = { path = "crates/oak_package_metadata" }
oak_r_process = { path = "crates/oak_r_process" }
oak_scan = { path = "crates/oak_scan" }
oak_semantic = { path = "crates/oak_semantic" }
oak_sources = { path = "crates/oak_sources" }
once_cell = "1.21.4"
Expand Down
7 changes: 7 additions & 0 deletions crates/oak_core/src/file.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
use std::path::Path;
use std::path::PathBuf;

/// Does this path's name look like an R file (`.R` / `.r` extension)?
///
/// Pure name test, no I/O. It doesn't touch the filesystem, so it says
/// nothing about whether the path exists or is a regular file. A
/// directory named `foo.R` passes, and so does a path that isn't on disk
/// at all. Callers that walk a real directory and want to skip such cases
/// must check `path.is_file()` themselves.
pub fn is_r_file(path: &Path) -> bool {
path.extension()
.and_then(|e| e.to_str())
Expand Down
2 changes: 2 additions & 0 deletions crates/oak_core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ pub mod file;
pub mod identifier;
pub mod range;
pub mod syntax_ext;

pub use file::is_r_file;
118 changes: 99 additions & 19 deletions crates/oak_db/src/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ use rustc_hash::FxHashMap;

use crate::File;
use crate::LibraryRoots;
use crate::LiveRoot;
use crate::OrphanRoot;
use crate::Package;
use crate::Root;
use crate::StaleRoot;
use crate::WorkspaceRoots;

/// Concrete-input surface of the salsa database. Each impl
Expand All @@ -26,6 +28,10 @@ pub trait DbInputs: salsa::Database {

/// Files not yet anchored to any workspace or library root.
fn orphan_root(&self) -> OrphanRoot;

/// Files and packages from roots that have been removed. Holding
/// pen for entity reuse on re-add (see [`StaleRoot`]).
fn stale_root(&self) -> StaleRoot;
}

/// Salsa database trait used throughout `oak_db`. Tracked queries take `&dyn
Expand Down Expand Up @@ -53,6 +59,47 @@ pub trait Db: DbInputs {
/// - Installed packages in an earlier root shadow later ones
/// (mirroring `.libPaths()`).
fn package_by_name(&self, name: &str) -> Option<Package>;

/// Resolve the live `Root` that contains `pkg`, if any.
///
/// Returns `None` when the package is only in [`StaleRoot`] (its live
/// container was previously evicted).
///
/// **Nested roots.** Two roots can claim the same package when one is
/// nested inside the other, e.g. the frontend opens both `/proj` and
/// `/proj/sub-pkg` as workspace folders and both scans walk into
/// `sub-pkg/DESCRIPTION`. Both scans hand the same `Package` entity to
/// their respective root's `packages` vec; the longest-path root wins
/// the ownership query here. The shorter root's vec still transiently
/// lists the package, but it self-heals on its next scan since
/// `set_packages` replaces the vec wholesale.
fn root_by_package(&self, pkg: Package) -> Option<Root>;

/// All live roots in lookup-precedence order: workspace folders first, then
/// library paths (mirroring R's `.libPaths()`), then the orphan bucket.
/// Stale roots are not included. Salsa-cached and invalidates when one of
/// `workspace_roots` / `library_roots` / `orphan_root` changes.
fn live_roots(&self) -> &[LiveRoot];
Comment on lines +80 to +82

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also accept Dead root instead of Stale root since we have terminology like:

  • live roots
  • "resurrect"ing of dead/stale roots

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be fine, just a bit grim to my taste 😄

}

#[salsa::tracked(returns(ref))]
pub(crate) fn live_roots_query(db: &dyn Db) -> Vec<LiveRoot> {
let mut roots: Vec<LiveRoot> = db
.workspace_roots()
.roots(db)
.iter()
.map(|&r| LiveRoot::Workspace(r))
.collect();

roots.extend(
db.library_roots()
.roots(db)
.iter()
.map(|&r| LiveRoot::Library(r)),
);

roots.push(LiveRoot::Orphan(db.orphan_root()));
roots
}

/// Implementation of [`Db::file_by_url`]. Walks the per-root indices.
Expand All @@ -61,34 +108,67 @@ pub trait Db: DbInputs {
/// entity), but every step is: each [`root_url_index`] call returns a
/// cached map, so adding a file to one root invalidates only that
/// root's index.
pub fn file_by_url_query(db: &dyn Db, url: &UrlId) -> Option<File> {
for root in db.workspace_roots().roots(db) {
if let Some(&file) = root_url_index(db, *root).get(url) {
return Some(file);
pub(crate) fn file_by_url_query(db: &dyn Db, url: &UrlId) -> Option<File> {
for &root in db.live_roots() {
let hit = match root {
LiveRoot::Workspace(r) | LiveRoot::Library(r) => {
root_url_index(db, r).get(url).copied()
},
LiveRoot::Orphan(_) => orphan_url_index(db).get(url).copied(),
};
if hit.is_some() {
return hit;
}
}
for root in db.library_roots().roots(db) {
if let Some(&file) = root_url_index(db, *root).get(url) {
return Some(file);
}
}
orphan_url_index(db).get(url).copied()
None
}

/// Implementation of [`Db::package_by_name`]. Same shape as
/// [`file_by_url_query`].
pub fn package_by_name_query(db: &dyn Db, name: &str) -> Option<Package> {
for root in db.workspace_roots().roots(db) {
if let Some(&pkg) = root_package_index(db, *root).get(name) {
return Some(pkg);
/// [`file_by_url_query`]; orphan has no packages, so it contributes
/// nothing to the walk.
pub(crate) fn package_by_name_query(db: &dyn Db, name: &str) -> Option<Package> {
for &root in db.live_roots() {
if let LiveRoot::Workspace(r) | LiveRoot::Library(r) = root {
if let Some(&pkg) = root_package_index(db, r).get(name) {
return Some(pkg);
}
}
}
for root in db.library_roots().roots(db) {
if let Some(&pkg) = root_package_index(db, *root).get(name) {
return Some(pkg);
None
}

/// Implementation of [`Db::root_by_package`]. Walks all live roots looking for
/// `pkg` in their `packages` vec, picking the longest-path root on ties.
pub(crate) fn root_by_package_query(db: &dyn Db, pkg: Package) -> Option<Root> {
let mut best: Option<(Root, usize)> = None;
for &root in db.live_roots() {
let (LiveRoot::Workspace(r) | LiveRoot::Library(r)) = root else {
continue;
};
if r.packages(db).contains(&pkg) {
let depth = root_depth(db, r);
if best.is_none_or(|(_, d)| depth > d) {
best = Some((r, depth));
}
}
}
None
best.map(|(root, _)| root)
}

/// Number of path segments in a root's URL. Used as the tiebreaker by
/// [`root_by_package_query`] when nested roots both claim the same package.
///
/// Counts URL segments directly rather than going through `to_file_path()`.
/// `to_file_path()` errors on Windows for non-OS-style URLs (no drive
/// letter), which would silently collapse all depths to zero and degrade
/// the tiebreaker into "first found wins". Depth is a structural property
/// of the URL hierarchy, so the URL itself is the right source.
fn root_depth(db: &dyn Db, root: Root) -> usize {
root.path(db)
.as_url()
.path_segments()
.map(|s| s.filter(|seg| !seg.is_empty()).count())
.unwrap_or(0)
}

/// Per-root URL -> File index. Salsa caches one map per `Root`;
Expand Down
17 changes: 12 additions & 5 deletions crates/oak_db/src/file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ pub struct File {
pub url: UrlId,
#[returns(ref)]
pub contents: String,
/// **Placement invariant.** Call this setter only through
/// `oak_scan`'s helpers; see the type-level doc above.
pub package: Option<Package>,
}

Expand Down Expand Up @@ -144,17 +146,22 @@ impl File {

/// The root containing this file, if any.
///
/// If the file has a registered [`Package`], dispatches through
/// `Package.root`. Otherwise falls back to a URL-prefix lookup
/// against [`WorkspaceRoots`] (orphan files live under a workspace
/// root or nowhere; library files always have a package).
/// If the file has a registered [`Package`], asks the db which live
/// root holds it via [`Db::root_by_package`]. Otherwise falls back to a
/// URL-prefix lookup against [`WorkspaceRoots`] (orphan files live
/// under a workspace root or nowhere). Library files normally have
/// a package; the `root_by_package` branch covers them too.
///
/// Returns `None` if the file's package was evicted to
/// [`StaleRoot`] (no live root contains it), or if the file is in
/// orphan and the URL falls outside every workspace folder.
///
/// Callers that need to distinguish workspace from library roots
/// inspect `root.kind(db)`.
#[salsa::tracked]
pub fn root(self, db: &dyn Db) -> Option<Root> {
if let Some(pkg) = self.package(db) {
return Some(pkg.root(db));
return db.root_by_package(pkg);
}
root_by_url(db, self.url(db))
}
Expand Down
100 changes: 95 additions & 5 deletions crates/oak_db/src/inputs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ pub struct Root {
/// Top-level R scripts directly under this root. Each entry is a
/// `File` with `package(db) == None`. Always empty for `Library`
/// roots.
///
/// **Placement invariant.** A file present here must have
/// `package(db) == None`, and a file with `package == None` must
/// live here, in another `Root.scripts`, or in
/// `OrphanRoot.files`. Call this setter only through `oak_scan`'s
/// helpers, which keep the back-pointer and the container in sync.
#[returns(ref)]
pub scripts: Vec<File>,
/// Packages discovered under this root (workspace packages for
Expand All @@ -39,6 +45,31 @@ pub enum RootKind {
Library,
}

/// A live root container that participates in analysis lookups.
///
/// Bundles the three salsa inputs that hold files / packages the user is
/// actively working with: workspace [`Root`]s, library [`Root`]s, and the
/// [`OrphanRoot`] that catches unanchored buffers. Stale entities in
/// [`StaleRoot`] aren't included -- they have separate access patterns
/// (scanner upsert only, never analysis), so they stay as their own input.
///
/// `Db::live_roots()` yields these in lookup precedence (workspace first, then
/// library, then orphan).
///
/// TODO(salsa): this enum carries the workspace-vs-library distinction in its
/// variant tag, which makes the `Root.kind` field redundant. Drop the field
/// once callers route through `LiveRoot` everywhere instead of reading
/// `root.kind(db)` directly. Further out, splitting `Root` into separate
/// `WorkspaceRoot` and `LibraryRoot` salsa inputs (each with the fields that
/// actually apply to its kind: scripts only on workspace, etc.) frees up
/// the name `Root` to be this enum.
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub enum LiveRoot {
Workspace(Root),
Library(Root),
Orphan(OrphanRoot),
}

/// The set of workspace folders the user has open.
///
/// Populated by the LSP layer from `initialize.workspaceFolders` and
Expand Down Expand Up @@ -88,8 +119,11 @@ impl LibraryRoots {
/// Singleton: there is one `OrphanRoot` per concrete database, lazily
/// initialised by the implementation. The `files` field is what
/// [`crate::Db::file_by_url`] consults to find unanchored files.
#[salsa::input]
#[salsa::input(debug)]
pub struct OrphanRoot {
/// **Placement invariant.** Files here must have `package(db) ==
/// None`. Call this setter only through `oak_scan`'s helpers,
/// which keep the back-pointer and the container in sync.
#[returns(ref)]
pub files: Vec<File>,
}
Expand All @@ -100,12 +134,63 @@ impl OrphanRoot {
}
}

/// Files and packages from workspace or library roots that were removed
/// during a `set_*_paths` call.
///
/// Salsa doesn't garbage-collect entities, so dropping a `Root` doesn't
/// free its `File` and `Package` entities. They'd just leak. Instead we
/// move them here and consult this bucket on the next `set_*_paths`,
/// reusing entities by URL when their paths come back. This matters for
/// agent / multi-repo workflows where the same workspace folder gets
/// added and removed repeatedly across a session.
///
/// **Not consulted by analysis.** `Db::file_by_url` and
/// `Db::package_by_name` walk workspace / library roots and (for files)
/// `OrphanRoot` only. Entities in `StaleRoot` are invisible to
/// completions, goto-def, etc. — they correspond to folders the user
/// has explicitly removed.
///
/// **Consulted by scanners.** The scanner's package-by-URL lookup walks
/// live roots then falls back to stale. Scanner upsert helpers do the same
/// for files. On reuse, the entity is moved out of stale back into a live
/// container.
///
/// Singleton like `OrphanRoot`. The `files` and `packages` fields are
/// independent: a stale file's `package` may still point at a stale
/// package, and that's fine. Both are invisible to analysis until one
/// of them gets pulled back into a live container.
#[salsa::input]
pub struct StaleRoot {
#[returns(ref)]
pub files: Vec<File>,
#[returns(ref)]
pub packages: Vec<Package>,
}

impl StaleRoot {
pub fn empty(db: &dyn Db) -> Self {
Self::new(db, vec![], vec![])
}
}

#[salsa::input(debug)]
pub struct Package {
/// The `Root` this package belongs to. Workspace packages live under
/// a [`RootKind::Workspace`] root, installed packages live under a
/// [`RootKind::Library`] root. Read `root.kind(db)` to distinguish.
pub root: Root,
/// URL of the package's `DESCRIPTION` file. Stable identity across
/// rescans and workspace / library churn: scanners look up an
/// existing `Package` by this URL before creating a new one. Two
/// packages with the same `Package:` name can coexist on disk and the
/// URL distinguishes them.
///
/// The package's owning [`Root`] is not stored as a field. It is
/// derived from live-graph containment via [`Db::root_by_package`]: a
/// package belongs to whichever `Root.packages` currently holds it.
/// Workspace-vs-library is then `root.kind(db)`.
#[returns(ref)]
pub description_url: UrlId,
// TODO(salsa): Expose a tracked `name_interned(db) -> Name<'db>`
// method so `db.package_by_name()` and other lookups key on the
// interned id rather than the string. Can't store `Name<'db>` on
// `Package` directly because salsa inputs are lifetime-free.
#[returns(ref)]
pub name: String,
/// Installed-package version (from `DESCRIPTION`). `None` for
Expand All @@ -118,6 +203,11 @@ pub struct Package {
/// Per-package granularity: adding or removing a file in one
/// package doesn't invalidate tracked queries reading another
/// package's files.
///
/// **Placement invariant.** A file present here must have
/// `package(db) == Some(self)`. Call this setter only through
/// `oak_scan`'s helpers, which keep the back-pointer and the
/// container in sync.
#[returns(ref)]
pub files: Vec<File>,
/// The basename ordering from `DESCRIPTION`'s `Collate` field, if
Expand Down
Loading
Loading