Skip to content

source origins#1

Open
jld-adriano wants to merge 7 commits intomasterfrom
devin/1773720898-source-origins
Open

source origins#1
jld-adriano wants to merge 7 commits intomasterfrom
devin/1773720898-source-origins

Conversation

@jld-adriano
Copy link
Copy Markdown

@jld-adriano jld-adriano commented Mar 18, 2026

jld-adriano and others added 4 commits March 17, 2026 04:40
Maps inputSrc store paths back to original filesystem source paths.

During evaluation, nix copies source paths into the store via
copyPathToStore(). This commit:

1. Adds a reverse mapping (storeToSrc) to EvalState that is populated
   alongside the existing srcToStore cache.

2. Exposes getSourceOrigin(StorePath) and getSourceOrigins() accessors
   on EvalState to query this mapping after evaluation.

3. Adds a new 'nix derivation source-origins' command that evaluates
   installables and, for each resulting derivation, prints a JSON
   mapping of each inputSrc store path to its original source path.

This enables build-system tooling to determine which working-tree
directories contributed to a derivation's build inputs - useful for
monorepo CI to know which parts of the repo are affected by a change.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Add sourceStoreToOriginalPath map to EvalState for tracking
  store path → original filesystem path mappings
- Set originalRootPath on accessors in git.cc (git root) and path.cc
  (flake dir) input schemes so mountInput() can record mappings
- Implement recordPathOrigin() with 3-strategy resolution for
  builtins.path/cleanSourceWith sources that bypass storeToSrc
- Modify addPath() to call recordPathOrigin() in both cached and
  new store path branches
- Auto-disable eval cache in source-origins command
- Use git+file:// scheme for correct monorepo root resolution

Tested on diverse monorepo flakes: rust (atlas-ops, atlas, queue-urls),
python (atlas), typescript (clank-stank, exa-deploy), go (readability_parser),
and flakes/graph. All correctly resolve local source paths.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…e-level precision

When a store path was created by cleanSourceWith with a broad source root
(e.g. src = ../../../.), the sourcePath would report the monorepo root,
making every file change trigger a match. Now we enumerate the actual
store path contents (BFS) and report individual files as sourceFiles[].

The store path IS the filtered result — its contents are exactly what
passed the cleanSourceWith filter. Each file is mapped back to its
original source location using the sourcePath as the base.

This gives file-level precision for all cleanSourceWith/builtins.path
cases, making dependency graph filtering dramatically more precise.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…:// examples, and implementation details

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread src/nix/derivation-source-origins.cc Outdated
Comment thread src/libfetchers/git.cc

// Record the git repo root as the original filesystem path so that
// source-origins can map store paths back to their original locations.
accessor->originalRootPath = repoPath;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 originalRootPath lost when git repo has submodules

originalRootPath is set on the inner git workdir accessor at src/libfetchers/git.cc:994, but when the repo has submodules, accessor is reassigned to a new MountedSourceAccessor at line 1025. The mounted accessor does not inherit originalRootPath from the inner accessor. Downstream in mountInput() (src/libexpr/paths.cc:42), the check accessor->originalRootPath will be false, falling through to the input.getSourcePath() fallback which may return a different path (the URL path rather than the resolved repo root) or nullopt for non-file:// URLs.

Prompt for agents
In src/libfetchers/git.cc, in getAccessorFromWorkdir(), the originalRootPath is set on the inner accessor at line 994, but after the submodule block (lines 999-1026) when accessor is replaced with a mounted accessor, the originalRootPath is lost. After line 1025 (accessor = makeMountedSourceAccessor(...)), add: accessor->originalRootPath = repoPath; to propagate the original root path to the new mounted accessor.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration bot added a commit that referenced this pull request Mar 26, 2026
Profiled with callgrind on real nixpkgs evaluation (hello.drvPath) and
implemented seven targeted optimizations:

1. Bindings::get(): linear scan for small chunks (≤8 elements)
   Avoids binary search overhead for the common case of small attribute
   sets. 10% instruction reduction in Bindings::get.

2. forceValue(): [[unlikely]] annotations on thunk/app/failed branches
   A value is forced at most once but read many times; marking the
   forcing branches unlikely lets the compiler optimize the fast
   fall-through for already-evaluated values.

3. callFunction(): cache profiler hooks check
   Avoid re-querying profiler.getNeededHooks() in the Finally destructor
   on every function return.

4. eqValues(): early pointer-equality check
   If both references alias the same Value, skip the second forceValue
   and return true immediately. Common in set equality and uniqList.

5. ExprConcatStrings::eval(): stack-allocated small vector
   Use boost::container::small_vector<BackedStringView, 8> instead of
   std::vector to avoid a heap allocation on every string interpolation.

6. Bindings::iterator: two-way merge fast path
   The most common layered case (a single // merge) now uses a simple
   two-pointer merge instead of heap-based k-way merge, eliminating all
   heap operations. This was the #1 eval-specific hotspot at ~5% of
   total instructions.

7. Bindings::iterator::push(): use push_heap instead of make_heap
   For the remaining multi-way (3+ layer) case, use O(log n) push_heap
   instead of O(n) make_heap, and build the heap once in the constructor.

Callgrind results (nixpkgs hello.drvPath):
  ExprOpUpdate::eval (// operator): 22.8M → 18.9M instr (−17%)
  Bindings::get:                    13.2M → 11.9M instr (−10%)
  __adjust_heap (heap ops):         25.9M →  8.0M instr (−69%)

All outputs match between baseline and optimized on nixpkgs hello.drvPath,
builtins.attrNames count (25417), and synthetic benchmarks.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration bot and others added 3 commits March 27, 2026 23:29
source-origins now traces inputDrvs → inputSrcs through the entire
build closure, not just the top-level derivation. This lets
nix-dep-graph discover dependencies like uv path deps that are
inputSrcs of derivations deep in the transitive closure.

The old non-recursive behavior is available via --no-recursive.

Fixes GIGA-2286.

Co-Authored-By: Michael Fine <mfine15@gmail.com>
When recursing through the full derivation closure, most transitive
derivations have no source origin mapping (they come from substituters
or previous builds). The JSON value() call was throwing because the
key exists with a null type, not a string. Use is_string() check
instead.

Co-Authored-By: Michael Fine <mfine15@gmail.com>
…ecursive

[source-origins]: recurse through full derivation closure by default
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines +198 to +199
if (de.is_directory()) {
dirs.push(de.path());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 BFS directory traversal follows symlinks, producing incorrect sourceFiles entries

In derivation-source-origins.cc:198, de.is_directory() follows symlinks. If a store path contains a symlink pointing to a directory (e.g. lib -> /nix/store/other-pkg/lib), the BFS treats it as a real directory and recurses into it, enumerating files from the symlink target rather than from the original source tree. This causes two issues: (1) the symlink itself is never listed in sourceFiles because the else branch is skipped, and (2) files from the symlink target are incorrectly listed as source files with wrong paths mapped through sourceRoot / relPath. The fix is to check de.is_symlink() before checking is_directory(), or use de.symlink_status() to avoid following symlinks.

Suggested change
if (de.is_directory()) {
dirs.push(de.path());
for (auto & de : fs::directory_iterator(dir)) {
if (!de.is_symlink() && de.is_directory()) {
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/libexpr/eval.cc
Comment on lines +2658 to +2661
if (sourceStoreToOriginalPath.size() == 1) {
auto & [srcStorePath, origPath] = *sourceStoreToOriginalPath.begin();
sourceStoreToOriginalPath.try_emplace(storePath, origPath);
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Strategy 3 heuristic silently maps unrelated store paths to wrong original path

In recordPathOrigin at src/libexpr/eval.cc:2658-2661, when srcPath.path.isRoot() and sourceStoreToOriginalPath has exactly one entry, the code unconditionally maps the new storePath to that entry's original path without verifying any relationship between them. This means if there's one flake source mounted and any other builtins.path call with a root source path, the resulting store path gets silently mapped to the flake's source directory. The getOriginalPath() query at src/libexpr/eval.cc:2683-2688 then returns a wrong filesystem path, which propagates to the sourcePath field in the command output.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant