Add oak_scan crate with initial library scanning#1243
Conversation
| fn test_scan_multiple_library_paths_preserve_order() { | ||
| let tmp1 = tempfile::tempdir().unwrap(); | ||
| let tmp2 = tempfile::tempdir().unwrap(); | ||
| write_package(&tmp1.path().join("pkg1"), "pkg1", &[]); | ||
| write_package(&tmp2.path().join("pkg2"), "pkg2", &[]); | ||
| let mut db = OakDatabase::new(); | ||
|
|
||
| let paths: Vec<PathBuf> = vec![tmp1.path().to_path_buf(), tmp2.path().to_path_buf()]; | ||
| db.set_library_paths(&paths); | ||
|
|
||
| let roots = db.library_roots().roots(&db).clone(); | ||
| assert_eq!(roots.len(), 2); | ||
| assert_eq!(roots[0].packages(&db)[0].name(&db), "pkg1"); |
There was a problem hiding this comment.
Random thought
We have quite a few things in oak that are ordering dependent, but I always forget which is and is not order dependent
It feels like it would be cool to have SortedVec be returned here, or something kind of like it, where its clear that the result is sorted (then, of course, the question becomes "sorted by what"?).
But maybe LibraryRoots suffices, since its docs state that the order invariant is upheld there
There was a problem hiding this comment.
The roots are not sorted lexicographically though, or against a well defined sort key that can be easily used from elsewhere, so SortedVec would be a bit weird. It's more like they are ordered than sorted. I agree LibraryRoots suffices as the upholder of the invariant.
There was a problem hiding this comment.
Lets teach claude to always do tests/integration/ for all integration tests!
| /// Description of one R file the scanner wants to register. | ||
| /// | ||
| /// `contents` is the on-disk snapshot at scan time. It's used as the | ||
| /// initial content whenever the helper mints a new `File` entity, i.e. | ||
| /// the first time a URL is seen, whether at the initial scan or on a | ||
| /// later rescan that discovers a newly-created file. | ||
| /// | ||
| /// If a `File` already exists at this URL (scanner-created from an | ||
| /// earlier scan, or VFS-created via `didOpen`), the helpers reuse that | ||
| /// entity and leave its content alone. `set_contents` (driven by the | ||
| /// VFS) is the authoritative way to update content. | ||
| #[derive(Clone, Debug)] | ||
| pub struct FileEntry { |
There was a problem hiding this comment.
I had the thought "is it always an R file?"
It seems to be, so it might be nice if the name reflected this, maybe just RFile? Or RFileEntry.
But the generic FileEntry left me wondering if it was also used for other things besides just R files
There was a problem hiding this comment.
On the other hand there is nothing constraining the type to an R file, it's just that we only use it for R ones.
|
|
||
| /// Extension methods on the database for scanner orchestration and | ||
| /// placement-aware updates that don't have a natural `Root` receiver. | ||
| pub trait DbExt: Db + DbInputs { |
There was a problem hiding this comment.
I would like to advocate for a more informative trait name than DbExt
Perhaps DbLibraryPaths or DbScan
There was a problem hiding this comment.
Good idea, I like DbScan after the crate name.
| for &pkg in &packages { | ||
| pkg.set_files(db).to(Vec::new()); |
There was a problem hiding this comment.
| for &pkg in &packages { | |
| pkg.set_files(db).to(Vec::new()); | |
| for &package in &packages { | |
| package.set_files(db).to(Vec::new()); |
The naming inconsistency hurts me 😆
There was a problem hiding this comment.
yep I know, sorry. Using short names in small scopes and longer names in larger ones is my writing style (https://www.lysator.liu.se/c/pikestyle.html). Not sure how to resolve our difference on this :-)
| /// dropped on a previous `set_*_paths` call, so the entity gets | ||
| /// reused on re-add. Analysis paths should not call this — they use | ||
| /// [`Db::package_by_name`] which is stale-blind. | ||
| fn package_by_url(&self, url: &UrlId) -> Option<Package>; |
There was a problem hiding this comment.
I have no idea if I should use this vs package_by_name based on the names alone 😢
I wonder if it can be package_by_url_including_stale() or something to really discourage usage of it
Or separate into package_by_url() and stale_package_by_url() for use by the scanner
I wonder if we are going to have to expose some ugly internals like this by having the scanner and db in separate crates :/
There was a problem hiding this comment.
hmm good point.
I moved the by-url lookup to oak_scan, which is the only one that should be concerned with stale roots.
If an analysis crate has a URL for a package, they can look up the name on disk and use package_by_name(), which resolves the effective package for that name.
| /// Stale roots are not included. Salsa-cached and invalidates when one of | ||
| /// `workspace_roots` / `library_roots` / `orphan_root` changes. | ||
| fn live_roots(&self) -> &[LiveRoot]; |
There was a problem hiding this comment.
I would also accept Dead root instead of Stale root since we have terminology like:
- live roots
- "resurrect"ing of dead/stale roots
There was a problem hiding this comment.
That'd be fine, just a bit grim to my taste 😄
| pub fn package_by_url_query(db: &dyn Db, url: &UrlId) -> Option<Package> { | ||
| for &root in db.live_roots() { | ||
| if let LiveRoot::Workspace(r) | LiveRoot::Library(r) = root { | ||
| if let Some(&pkg) = root_package_url_index(db, r).get(url) { | ||
| return Some(pkg); | ||
| } | ||
| } | ||
| } | ||
| for root in db.library_roots().roots(db) { | ||
| if let Some(&pkg) = root_package_index(db, *root).get(name) { | ||
| return Some(pkg); | ||
| stale_package_url_index(db).get(url).copied() |
There was a problem hiding this comment.
yea i would be very happy to see a pair of functions like
package_by_url_query()/stale_package_by_url_query()live_package_by_url_query()/dead_package_by_url_query()
that feels less confusing to me in the long run, i feel like its a bit hard to tell exactly which set of roots are checked just from the function name. like, is the stale package set checked or not?
There was a problem hiding this comment.
I think moving the lookup to oak_scan mostly resolves the confusion.
d50f230 to
76a82b0
Compare
Branched from #1226
Progress towards #1212
Introduces
oak_scanwith a library scanner. The LSP layer calls (in the next PR) the scanner on startup with.libPaths(). Each libpath is set as a library root inoak_db.oak_scanis the only writer of Salsa inputs inoak_db, and is in charge of maintaining invariants, such as making sure backpointers are up-to-date.Library roots are static for now. Watching them for changes (package removed or installed) would require our own filesystem watcher. Since we don't need one for the workspace (we'll use LSP file events), the machinery would be a bit too heavy for what we would gain for it at this stage. For now the LSP can be restarted to register changes to libpaths.