state: Don't delete .new files in State::load() by m-blaha · Pull Request #2675 · rpm-software-management/dnf5

m-blaha · 2026-04-07T13:03:15Z

When multiple libdnf5 processes run concurrently (e.g. dnf5 transaction + PackageKit/GNOME Software loading the system repo), the .new file recovery code in State::load() can race with State::save() in another process. The recovery code would delete or rename .new files that the concurrent save() just wrote, causing save()'s subsequent rename() to fail with "cannot copy/rename" errors.

Fix by making load() purely read-only with respect to .new files. In case there are any .new files present, just log a warning and keep using the non-.new state files from the last successful save().

An alternative approach using Locker to synchronize State::load() and State::save() was considered. This would preserve crash recovery for the case where save() was interrupted during the rename phase (all .new files fully written). However, a crash during the write phase still leaves state inconsistent with rpmdb, requiring a full state rebuild (see #1610). Also, Locker requires write access to create the lock file in the state directory, which would break non-root read-only operations (e.g. dnf5 repoquery) unless fallback logic was added. The added complexity was not justified given these limitations.

Resolves: #2601

When multiple libdnf5 processes run concurrently (e.g. dnf5 transaction + PackageKit/GNOME Software loading the system repo), the .new file recovery code in State::load() can race with State::save() in another process. The recovery code would delete or rename .new files that the concurrent save() just wrote, causing save()'s subsequent rename() to fail with "cannot copy/rename" errors. Fix by making load() purely read-only with respect to .new files. In case there are any .new files present, just log a warning and keep using the non-.new state files from the last successful save(). An alternative approach using Locker to synchronize State::load() and State::save() was considered. This would preserve crash recovery for the case where save() was interrupted during the rename phase (all .new files fully written). However, a crash during the write phase still leaves state inconsistent with rpmdb, requiring a full state rebuild (see rpm-software-management#1610). Also, Locker requires write access to create the lock file in the state directory, which would break non-root read-only operations (e.g. dnf5 repoquery) unless fallback logic was added. The added complexity was not justified given these limitations. Resolves: rpm-software-management#2601 Signed-off-by: Marek Blaha <mblaha@redhat.com>

evan-goode

Thanks for the investigation and welcome back from the Packit exchange :)

The fix looks correct to me.

evan-goode · 2026-04-07T21:30:38Z

An alternative approach using Locker to synchronize State::load() and State::save() was considered. This would preserve crash recovery for the case where save() was interrupted during the rename phase (all .new files fully written). However, a crash during the write phase still leaves state inconsistent with rpmdb, requiring a full state rebuild (see #1610).

IMO ideally we would have locking in addition to this change.

Also, Locker requires write access to create the lock file in the state directory, which would break non-root read-only operations (e.g. dnf5 repoquery) unless fallback logic was added. The added complexity was not justified given these limitations.

For the "system repo" lock (#2519), we worked around that by having the lock file (/usr/lib/sysimage/libdnf5/system-repo.lock) be persistent and owned by root with 0664. Then unprivileged users can obtain read locks but not write locks.

Maybe standard practice should be to use the system repo lock for these state files too? The system repo lock is obtained in Context::load_repos; it's not automatically obtained for libdnf5 API users. The libdnf5 tutorial was updated to recommend obtaining the lock. I guess dnf5daemon consumers and the new PackageKit backend are not using it (yet).

I would hesitate to somehow enforce obtaining the system repo lock in libdnf5, since there are use cases where it's better to read a soon-to-be-invalid state than to wait for a long DNF5 process to finish and release a write lock. But maybe it could be opt-out instead of opt-in.

m-blaha requested a review from a team as a code owner April 7, 2026 13:03

m-blaha requested review from evan-goode and removed request for a team April 7, 2026 13:03

evan-goode approved these changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state: Don't delete .new files in State::load()#2675

state: Don't delete .new files in State::load()#2675
m-blaha wants to merge 1 commit intorpm-software-management:mainfrom
m-blaha:system-state-race

m-blaha commented Apr 7, 2026 •

edited

Loading

Uh oh!

evan-goode left a comment

Uh oh!

evan-goode commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m-blaha commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evan-goode left a comment

Choose a reason for hiding this comment

Uh oh!

evan-goode commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

m-blaha commented Apr 7, 2026 •

edited

Loading