feat(storage): introduce storage refactoring proposal to reduce globa… by rchincha · Pull Request #3927 · project-zot/zot

rchincha · 2026-04-03T17:01:13Z

…l lock contention

What type of PR is this?

Which issue does this PR fix:

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs showing the issue:

Testing done on this change:

Automation added to e2e:

Will this break upgrades or downgrades?

Does this PR introduce any user-facing change?:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…l lock contention Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>

Copilot

Pull request overview

Adds a storage refactoring proposal document aimed at reducing ImageStore’s global lock contention by scoping locks per repository and separating shared blob dedupe into a dedicated blobstore domain.

Changes:

Introduces a design proposal for split lock domains (blobstore + per-repo locks) with a defined lock acquisition order.
Proposes a global deduplicated blob namespace and changes to remote dedupe behavior to avoid empty marker entries.
Provides rollout/upgrade guidance and expected performance impact scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

andaaron

This is high level, but we should also decide if we should keep the same image store implementation for both local and remote. Having them together couples thing and adds extra requirements

For example if we separate them:

We can completely remove the dedupe logic from the remote implementation. We don't need to think about locking and unlocking for dedupe and we can focus on locks in case of API delete and GC.
We can theoretically remove the locking on the global store for GC at repo level. And do an independent GC on the global store level later. We don't care about global store lock during deletion API calls, because the deduped copies are independent at repo level. We would still need the global store lock on dedupe.

If we keep them the the same implementation we add limitations such as:

In case of GC at global store level we need to lock all repos in case of local storage, because remote store has a single copy, even though it should be needed for local storage,
In case of dedupe we may need to lock also for some lower level ops in the store implementation, even though remote storage shouldn't care about dedupe.

andaaron · 2026-04-04T16:58:25Z

+- mixed read/write traffic across many repositories is serialized more than necessary
+- lock wait time grows with repository cardinality and concurrent clients
+
+This is especially visible in multi-tenant deployments where repositories are independent in practice but coupled by lock scope.


What is this about, we don't have a tenant concept? This opens up a whole other discussion: Do we want to support multiple tenants in the same zot instance? Or do we want to support multiple tenants in a zot cluster on different nodes? Or do we say each tenant should have its own independent zot?

I think we should stick to the solving one issue at at time instead of increasing complexity. Handling remote vs local storage in the context of dedupe and GC should be resolved before getting into tenants. We should also reach a conclusion what is a valid zot cluster configuration before getting into tenants (in my opinion multiple zot instances sync-ing from each other using the sync extension is an exploit of the sync extension).

Multi-tenant simply means users operating on separate repositories.

andaaron · 2026-04-04T17:08:40Z

+
+### 1) Introduce a global blobstore namespace
+
+Use a dedicated hidden namespace (for example `_blobstore`) to store all deduplicated blob payloads.


Just a note, if we are to implement #2752 the links we provide to the client would be to this namespace, so it wouldn't be "hidden".

andaaron · 2026-04-04T17:12:27Z

+Replace the single global lock domain with two levels:
+
+- global blobstore lock: `RWMutex` protecting `_blobstore` metadata/data mutations
+- per-repository locks: `RWMutex` per repository protecting repo-local metadata (manifests, tags, links, indexes)


Note manifests and indexes are also blobs. So in case of DELETE APIC calls, dedupe and gc we would need to evaluate if the blob is used in other repositories before removing it from _blobstore

andaaron · 2026-04-04T17:17:30Z

+
+### 1) Introduce a global blobstore namespace
+
+Use a dedicated hidden namespace (for example `_blobstore`) to store all deduplicated blob payloads.


Also is _blobstore supposed to comply with an OCI layout it is just a folder including only the content _blobstore/<hash_algo>/<hash_value>?
This should be captured in the proposal.

If it is an oci layout, what would we write in the index? What role would those entries have? In what case would we read it? I am thinking of GC on the global store. We could have an index of all the manifests present in all the repos, for running GC on the global store as a repo. But this also means maintaining it. And we already have the DB for that, correct?

andaaron · 2026-04-04T17:18:11Z

+
+#### Locking intent
+
+- operations touching only repository metadata should use only that repository lock


This is vague. Manifests could be blobs shared across repos.
Maybe we should list these operations which would run using only the repository lock.

andaaron · 2026-04-04T17:34:35Z

+- empty marker entries can interfere with OCI distribution spec expectations around blob/object layout and semantics
+- each marker introduces additional backend operations (HEAD/LIST/PUT-like checks), increasing cloud API calls and cost


Very well said.

andaaron · 2026-04-04T17:35:37Z

+## Correctness and safety notes
+
+- lock acquisition order must be uniformly enforced in all code paths
+- blob existence checks and blob writes must remain atomic with respect to dedupe decisions


Suggested change

- blob existence checks and blob writes must remain atomic with respect to dedupe decisions

- blob existence checks and blob writes must remain atomic with respect to dedupe and GC decisions

andaaron · 2026-04-04T17:39:26Z

+
+Implications:
+
+- startup can be blocked while migration work is in progress


Can we maybe track what repositories have been migrated, maybe on a per repo level the "cache" DB? I am thinking we don't want to do this every time zot starts, and a system may be restarted in the middle of the migration, we don't want to assume everything was migrated already, or completelys tart over.

Currently a blocking call at start time.
Also, need to take into account a crash during migration (e.g., "ran out of space")

andaaron · 2026-04-04T17:40:11Z

+- smaller deployments where startup delay is acceptable
+- environments where a one-time maintenance window is available
+
+### Side-by-side sync/mirror migration


Makes sense.

andaaron · 2026-04-04T17:47:24Z

+- https://github.com/project-zot/zot/pull/2968
+- https://github.com/project-zot/zot/pull/3922
+
+These PRs can be used to stage and validate the refactor incrementally.


If you want to give this to the AI this is not enough.
I suggest you give it the exact order in which to use the PRs to solve the problem.

And there are still things to clarify before that.
For example in implement the _blobstore. And #3906 can be used as a reference, but this document may require something which does not necessarily allign wit the PR. I am thinking the full layout of global blobstore vs just the content of the blobs folder, decision which at the time I am writing this comment hasn't been made.

feat(storage): introduce storage refactoring proposal to reduce globa…

4ccc0d7

…l lock contention Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>

rchincha requested review from Copilot and vrajashkr April 3, 2026 17:01

Copilot started reviewing on behalf of rchincha April 3, 2026 17:02 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread docs/storage-refactoring-proposal.md

andaaron reviewed Apr 4, 2026

View reviewed changes


		### 1) Introduce a global blobstore namespace

		Use a dedicated hidden namespace (for example `_blobstore`) to store all deduplicated blob payloads.


		#### Locking intent

		- operations touching only repository metadata should use only that repository lock

		- empty marker entries can interfere with OCI distribution spec expectations around blob/object layout and semantics
		- each marker introduces additional backend operations (HEAD/LIST/PUT-like checks), increasing cloud API calls and cost

	- blob existence checks and blob writes must remain atomic with respect to dedupe decisions
	- blob existence checks and blob writes must remain atomic with respect to dedupe and GC decisions


		Implications:

		- startup can be blocked while migration work is in progress

Conversation

rchincha commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

andaaron left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rchincha Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rchincha Apr 10, 2026 •

edited

Loading