-
Notifications
You must be signed in to change notification settings - Fork 4
inotify exhausted under large parallel campaigns #130
Description
Summary
libCRS register-submit-dir uses inotify instances. Parallel OSS-CRS runs hit the default Linux host limit for fs.inotify.max_user_instances.
On a host with the default limit of 128, running multiple OSS-CRS fuzzer containers caused new containers to fail during startup with:
OSError(24, 'inotify instance limit reached')
What we observed
The processes consuming these inotify instances were libCRS watcher processes launched from /run_fuzzer.sh inside fuzzer containers.
Examples:
/root/.local/share/uv/tools/libcrs/bin/python /usr/local/bin/libCRS register-submit-dir pov /work/out/main/submitted_povs --log /tmp/pov_submit_main.log
/root/.local/share/uv/tools/libcrs/bin/python /usr/local/bin/libCRS register-submit-dir seed /work/out/worker_1/queue --log /tmp/seed_submit_worker_1.log
These were spawned in large numbers under active fuzzer containers, with parent process:
/bin/bash /run_fuzzer.sh
On the affected host, root was exactly at the inotify instance cap:
fs.inotify.max_user_instances = 128
root: 128 inotify instances
At that point, additional root-owned/container processes trying to create an inotify instance failed.
Why this is a problem
This makes OSS-CRS fragile on otherwise normal Linux hosts using the default kernel setting. Users can hit a non-obvious host-level failure that looks like a container/runtime problem, but is really exhaustion caused by many register-submit-dir watchers.
Suggested improvements
Possible fixes / mitigations:
Replace register-submit-dir and similar API's with dumb polling.
Potentially also recommend a larger fs.inotify.max_user_watches as a companion setting.
fs.inotify.max_user_instances=1024
Expected behavior
OSS-CRS should either:
- avoid consuming so many inotify instances by default, or
- fail with a clear, actionable diagnostic, or
- document the required host tuning clearly enough that operators can avoid this upfront.