Skip to content

Commit 5c97f01

Browse files
authored
Merge pull request #255025 from tweag/fileset.union
`lib.fileset.union`, `lib.fileset.unions`: init
2 parents f35534c + 94e103e commit 5c97f01

6 files changed

Lines changed: 611 additions & 176 deletions

File tree

doc/functions/fileset.section.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ File sets are easy and safe to use, providing obvious and composable semantics w
99
These sections apply to the entire library.
1010
See the [function reference](#sec-functions-library-fileset) for function-specific documentation.
1111

12-
The file set library is currently very limited but is being expanded to include more functions over time.
12+
The file set library is currently somewhat limited but is being expanded to include more functions over time.
1313

1414
## Implicit coercion from paths to file sets {#sec-fileset-path-coercion}
1515

lib/fileset/README.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,21 @@ An attribute set with these values:
4141
- `_type` (constant string `"fileset"`):
4242
Tag to indicate this value is a file set.
4343

44-
- `_internalVersion` (constant string equal to the current version):
45-
Version of the representation
44+
- `_internalVersion` (constant `2`, the current version):
45+
Version of the representation.
4646

4747
- `_internalBase` (path):
4848
Any files outside of this path cannot influence the set of files.
4949
This is always a directory.
5050

51+
- `_internalBaseRoot` (path):
52+
The filesystem root of `_internalBase`, same as `(lib.path.splitRoot _internalBase).root`.
53+
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.
54+
55+
- `_internalBaseComponents` (list of strings):
56+
The path components of `_internalBase`, same as `lib.path.subpath.components (lib.path.splitRoot _internalBase).subpath`.
57+
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.
58+
5159
- `_internalTree` ([filesetTree](#filesettree)):
5260
A tree representation of all included files under `_internalBase`.
5361

@@ -59,8 +67,8 @@ An attribute set with these values:
5967
One of the following:
6068

6169
- `{ <name> = filesetTree; }`:
62-
A directory with a nested `filesetTree` value for every directory entry.
63-
Even entries that aren't included are present as `null` because it improves laziness and allows using this as a sort of `builtins.readDir` cache.
70+
A directory with a nested `filesetTree` value for directory entries.
71+
Entries not included may either be omitted or set to `null`, as necessary to improve efficiency or laziness.
6472

6573
- `"directory"`:
6674
A directory with all its files included recursively, allowing early cutoff for some operations.
@@ -169,15 +177,9 @@ Arguments:
169177
## To update in the future
170178

171179
Here's a list of places in the library that need to be updated in the future:
172-
- > The file set library is currently very limited but is being expanded to include more functions over time.
180+
- > The file set library is currently somewhat limited but is being expanded to include more functions over time.
173181
174182
in [the manual](../../doc/functions/fileset.section.md)
175-
- > Currently the only way to construct file sets is using implicit coercion from paths.
176-
177-
in [the `toSource` reference](./default.nix)
178-
- > For now filesets are always paths
179-
180-
in [the `toSource` implementation](./default.nix), also update the variable name there
181183
- Once a tracing function exists, `__noEval` in [internal.nix](./internal.nix) should mention it
182184
- If/Once a function to convert `lib.sources` values into file sets exists, the `_coerce` and `toSource` functions should be updated to mention that function in the error when such a value is passed
183185
- If/Once a function exists that can optionally include a path depending on whether it exists, the error message for the path not existing in `_coerce` should mention the new function

lib/fileset/benchmark.sh

Lines changed: 93 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
#!/usr/bin/env bash
1+
#!/usr/bin/env nix-shell
2+
#!nix-shell -i bash -p sta jq bc nix -I nixpkgs=../..
3+
# shellcheck disable=SC2016
24

35
# Benchmarks lib.fileset
46
# Run:
@@ -28,38 +30,6 @@ work="$tmp/work"
2830
mkdir "$work"
2931
cd "$work"
3032

31-
# Create a fairly populated tree
32-
touch f{0..5}
33-
mkdir d{0..5}
34-
mkdir e{0..5}
35-
touch d{0..5}/f{0..5}
36-
mkdir -p d{0..5}/d{0..5}
37-
mkdir -p e{0..5}/e{0..5}
38-
touch d{0..5}/d{0..5}/f{0..5}
39-
mkdir -p d{0..5}/d{0..5}/d{0..5}
40-
mkdir -p e{0..5}/e{0..5}/e{0..5}
41-
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
42-
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
43-
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
44-
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}
45-
46-
bench() {
47-
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
48-
nix-instantiate --eval --strict --show-trace >/dev/null \
49-
--expr '(import <nixpkgs/lib>).fileset.toSource { root = ./.; fileset = ./.; }'
50-
cat "$tmp/stats.json"
51-
}
52-
53-
echo "Running benchmark on index" >&2
54-
bench "$nixpkgs" > "$tmp/new.json"
55-
(
56-
echo "Checking out $compareTo" >&2
57-
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
58-
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
59-
echo "Running benchmark on $compareTo" >&2
60-
bench "$tmp/worktree" > "$tmp/old.json"
61-
)
62-
6333
declare -a stats=(
6434
".envs.elements"
6535
".envs.number"
@@ -77,18 +47,94 @@ declare -a stats=(
7747
".values.number"
7848
)
7949

80-
different=0
81-
for stat in "${stats[@]}"; do
82-
oldValue=$(jq "$stat" "$tmp/old.json")
83-
newValue=$(jq "$stat" "$tmp/new.json")
84-
if (( oldValue != newValue )); then
85-
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1")
86-
if (( oldValue < newValue )); then
87-
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2
88-
else
89-
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2
50+
runs=10
51+
52+
run() {
53+
# Empty the file
54+
: > cpuTimes
55+
56+
for i in $(seq 0 "$runs"); do
57+
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
58+
nix-instantiate --eval --strict --show-trace >/dev/null \
59+
--expr 'with import <nixpkgs/lib>; with fileset; '"$2"
60+
61+
# Only measure the time after the first run, one is warmup
62+
if (( i > 0 )); then
63+
jq '.cpuTime' "$tmp/stats.json" >> cpuTimes
9064
fi
91-
(( different++ )) || true
92-
fi
93-
done
94-
echo "$different stats differ between the current tree and $compareTo"
65+
done
66+
67+
# Compute mean and standard deviation
68+
read -r mean sd < <(sta --mean --sd --brief <cpuTimes)
69+
70+
jq --argjson mean "$mean" --argjson sd "$sd" \
71+
'.cpuTimeMean = $mean | .cpuTimeSd = $sd' \
72+
"$tmp/stats.json"
73+
}
74+
75+
bench() {
76+
echo "Benchmarking expression $1" >&2
77+
#echo "Running benchmark on index" >&2
78+
run "$nixpkgs" "$1" > "$tmp/new.json"
79+
(
80+
#echo "Checking out $compareTo" >&2
81+
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
82+
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
83+
#echo "Running benchmark on $compareTo" >&2
84+
run "$tmp/worktree" "$1" > "$tmp/old.json"
85+
)
86+
87+
read -r oldMean oldSd newMean newSd percentageMean percentageSd < \
88+
<(jq -rn --slurpfile old "$tmp/old.json" --slurpfile new "$tmp/new.json" \
89+
' $old[0].cpuTimeMean as $om
90+
| $old[0].cpuTimeSd as $os
91+
| $new[0].cpuTimeMean as $nm
92+
| $new[0].cpuTimeSd as $ns
93+
| (100 / $om * $nm) as $pm
94+
# Copied from https://github.com/sharkdp/hyperfine/blob/b38d550b89b1dab85139eada01c91a60798db9cc/src/benchmark/relative_speed.rs#L46-L53
95+
| ($pm * pow(pow($ns / $nm; 2) + pow($os / $om; 2); 0.5)) as $ps
96+
| [ $om, $os, $nm, $ns, $pm, $ps ]
97+
| @sh')
98+
99+
echo -e "Mean CPU time $newMean (σ = $newSd) for $runs runs is \e[0;33m$percentageMean% (σ = $percentageSd%)\e[0m of the old value $oldMean (σ = $oldSd)" >&2
100+
101+
different=0
102+
for stat in "${stats[@]}"; do
103+
oldValue=$(jq "$stat" "$tmp/old.json")
104+
newValue=$(jq "$stat" "$tmp/new.json")
105+
if (( oldValue != newValue )); then
106+
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1")
107+
if (( oldValue < newValue )); then
108+
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2
109+
else
110+
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2
111+
fi
112+
(( different++ )) || true
113+
fi
114+
done
115+
echo "$different stats differ between the current tree and $compareTo"
116+
echo ""
117+
}
118+
119+
# Create a fairly populated tree
120+
touch f{0..5}
121+
mkdir d{0..5}
122+
mkdir e{0..5}
123+
touch d{0..5}/f{0..5}
124+
mkdir -p d{0..5}/d{0..5}
125+
mkdir -p e{0..5}/e{0..5}
126+
touch d{0..5}/d{0..5}/f{0..5}
127+
mkdir -p d{0..5}/d{0..5}/d{0..5}
128+
mkdir -p e{0..5}/e{0..5}/e{0..5}
129+
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
130+
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
131+
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
132+
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}
133+
134+
bench 'toSource { root = ./.; fileset = ./.; }'
135+
136+
rm -rf -- *
137+
138+
touch {0..1000}
139+
bench 'toSource { root = ./.; fileset = unions (mapAttrsToList (name: value: ./. + "/${name}") (builtins.readDir ./.)); }'
140+
rm -rf -- *

0 commit comments

Comments
 (0)