Skip to content
This repository was archived by the owner on Feb 25, 2026. It is now read-only.

Commit c0066b1

Browse files
committed
doc(ADR#9): outline new refined model for provenance
1 parent d8b16b3 commit c0066b1

1 file changed

Lines changed: 58 additions & 31 deletions

File tree

adrs/0009-repository-identity-and-discovery.md

Lines changed: 58 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -10,40 +10,40 @@ This ADR documents the architecture for `eka` repository identity, atom naming,
1010

1111
The core problems to be solved are:
1212

13-
1. **Consistent Identity Model**: An atom's identity is a two-part system: a machine-verifiable component (the root commit hash) and a human-readable name (`label`). The repository identity model should follow this same successful pattern. A formal mechanism is needed to add a user-definable naming component to the repository's identity. A primary consequence of this is the robust disambiguation of forks from mirrors.
13+
1. **Consistent Identity Model**: An atom's identity is a two-part system: a machine-verifiable component (the root commit hash) and a human-readable name (`label`). The repository identity model should follow a similar pattern. A formal mechanism is needed to establish repository identity that provides robust disambiguation of forks from mirrors.
1414
2. **Source of Truth**: A repository's composition is implicitly defined by the atoms present on the filesystem. A formal, declarative manifest is needed to act as the single, unambiguous source of truth.
1515
3. **Discovery Inefficiency**: A performant method is needed to discover all atoms within a local checkout without expensive filesystem traversals.
1616
4. **Terminology Ambiguity**: The historical use of `tag` for an atom's unique identifier is ambiguous when juxtaposed with the `tags` metadata list.
17-
5. **Remote Discovery**: The purpose of the existing "Manifest Ref" must be clarified as the primary mechanism for all remote metadata discovery.
1817

1918
## Decision
2019

21-
The architecture is centered on a new root `ekala.toml` manifest as the single source of truth. It also formalizes terminology and clarifies the role of the "Manifest Ref" for remote discovery.
20+
The architecture is centered on a new root `ekala.toml` manifest as the single source of truth. It establishes repository identity through initialization commits with entropy injection, providing robust fork disambiguation and temporal anchoring.
2221

2322
### 1. The Source of Truth: `ekala.toml` (New)
2423

2524
A single `ekala.toml` file **must** exist at the root of the repository. Its primary purpose is to serve as the **single source of truth** for the repository's composition.
2625

27-
- **Function**: It defines the repository's canonical `label` for fork disambiguation and provides a complete, static index of all `packages` (atoms) it contains.
26+
- **Function**: It provides a complete, static index of all `packages` (atoms) it contains. Repository identity is established through the initialization process rather than explicit naming. It also supports optional metadata for enhanced discoverability.
2827
- **Format**:
2928

3029
```toml
3130
# ekala.toml
3231

33-
[project]
34-
# The canonical, human-readable name for this repository.
35-
# This is mixed into the atom ID hash to disambiguate forks.
36-
label = "my-project"
37-
38-
# An optional list of tags for logically grouping entire repositories.
39-
tags = ["ui-kit", "experimental"]
40-
4132
# A flat list of all atoms in this repository, identified by their path.
4233
# The publisher will enforce that all atom names are unique within the repository.
34+
[set]
4335
packages = [
4436
"path/to/ui-kit/button",
4537
"path/to/core/validator",
4638
]
39+
40+
41+
# Optional key-value metadata for structured filtering and queries
42+
[metadata]
43+
domain = "my-company.com"
44+
license = "MIT"
45+
# Optional tags for simple categorization
46+
tags = ["ui-kit", "experimental"]
4747
```
4848

4949
### 2. The Atom and its Metadata: `atom.toml` (Terminology Change)
@@ -62,28 +62,44 @@ Each atom continues to be defined by an `atom.toml` file. This ADR formalizes a
6262
label = "button"
6363
version = "1.0.0"
6464

65+
66+
# Optional key-value metadata for structured filtering and queries
67+
[metadata]
68+
license = "MIT"
69+
maintainer = "ui-team@company.com"
6570
# An optional list of arbitrary strings for logical grouping.
6671
# This is the foundation for metadata-driven collections.
6772
tags = ["ui", "interactive"]
6873
```
6974

70-
### 3. Atom Identity (Formalized)
75+
### 3. Repository Identity (New)
76+
77+
Repository identity is established through an initialization commit with entropy injection, providing robust disambiguation and temporal anchoring. Unlike atoms (which are individual components that benefit from human-readable names), repositories are collections of components where temporal identity provides clearer provenance tracking.
78+
79+
**Note**: This initialization commit mechanism is outlined here but will be implemented post-MVP to avoid delaying the core functionality.
80+
81+
- **Initialization Process**: When `eka init` is run, a special initialization commit is created. This commit includes injected entropy (random data for cryptographic strength) in its header along with a unique "ekala" identifier. Git commits are snapshots of repository state with metadata; headers contain additional information like author details.
82+
- **Identity Components**: Repository identity is defined by this initialization commit, which implicitly includes the repository's complete history (including the original root commit) through Git's ancestry system. Git maintains a chain of commits where each commit references its parent(s), forming a tree structure that links the initialization point to the repository's entire development timeline.
83+
- **Temporal Anchoring**: The init commit establishes a clear point in history when the repository was explicitly configured for Ekala, preventing publication of atoms created before this point and enabling precise analysis of when forks occurred. This creates a temporal boundary that distinguishes "before Ekala" from "after Ekala" in the repository's history.
84+
- **Fork Tracking**: The unique "ekala" identifier in init commit headers allows tracking of repository reinitializations and fork points by marking commits that represent new identity establishments, providing a historical record of when repositories established independent identities.
85+
86+
### 4. Atom Identity (Formalized)
7187

7288
An atom's identity is a cryptographic hash. This ADR formalizes its components.
7389

7490
- **Hashing Components**: The ID is derived from two components:
75-
1. The repository's **root commit hash**.
91+
1. The repository's **init commit hash** (which implicitly encodes the entire repository history including the root commit through Git's parent chain system).
7692
2. The atom's `label` (as defined in its `atom.toml`).
77-
- **Fork Disambiguation**: The `project.label` from the root `ekala.toml` is incorporated into the hashing process, ensuring that forks with identical roots produce unique atom IDs.
93+
- **Fork Disambiguation**: The init commit identity ensures that repositories with different initialization histories produce unique atom IDs, even if they share the same root of history.
7894

79-
### 4. Git Refspec Architecture
95+
### 5. Git Refspec Architecture
8096

8197
To support this architecture, a unified and consistent Git refspec is required. All `ekala`-specific refs will live under the `refs/ekala/` namespace.
8298

83-
- **Repository Identity**: The repository's canonical name is advertised in a single, top-level ref.
99+
- **Repository Identity**: Repository identity is established through a single ref that points to the latest initialization commit, leveraging Git's Merkle tree structure (where each commit contains references to its parent commits).
84100

85-
- **Format**: `refs/ekala/project/<project-label>`
86-
- **Content**: This ref points to the repository's root commit hash.
101+
- **Format**: `refs/ekala/init`
102+
- **Content**: Points to the entropy-injected initialization commit hash. The root commit is implicitly encoded through the commit's ancestry chain, eliminating the need for a separate root ref.
87103

88104
- **Atom Content**: The primary ref for an atom points directly to its content. This path is optimized for the most common operation.
89105

@@ -93,20 +109,28 @@ To support this architecture, a unified and consistent Git refspec is required.
93109
- **Manifest**: `refs/ekala/manifests/<atom-label>/<version>`
94110
- **Origin**: `refs/ekala/origins/<atom-label>/<version>`
95111

96-
### 5. Lifecycle Management: Project Renames
112+
### 7. Lifecycle Management: Repository Evolution
97113

98-
A project rename is a critical lifecycle event that must be handled gracefully. This is managed in the manifest, which is the single source of truth.
114+
Repository identity evolution is handled through the immutable initialization commit system, eliminating the need for the complex deprecation mechanisms in the previous draft.
99115

100-
- **Mechanism**: The `ekala.toml` manifest is extended with an optional `deprecated.labels` field.
101-
```toml
102-
[project]
103-
label = "new-project-name"
104-
deprecated.labels = ["old-project-name"]
105-
```
106-
- **Publisher Behavior**: The `eka publish` command will publish the primary ref (`refs/ekala/project/new-project-name`) and a special deprecation ref (`refs/ekala/deprecated/old-project-name`).
107-
- **Resolver Behavior**: When resolving a dependency on an old name, the resolver will discover the deprecation ref, follow it to the new name, and emit a warning to the user, ensuring a non-breaking upgrade path.
116+
- **Mechanism**: Since repository identity is tied to immutable Git commits rather than mutable labels, identity changes require explicit reinitialization. This provides clean slate evolution without legacy baggage.
117+
- **Publisher Behavior**: The `eka init` command creates an initialization commit with the ekala.toml manifest and publishes the `refs/ekala/init` ref pointing to it, establishing the repository's identity.
118+
- **Resolver Behavior**: Resolvers verify atom authenticity by checking that the atom's identity components match the published repository's initialization commit hash.
119+
120+
### 6. Alternatives Considered
121+
122+
#### User-Managed Repository Labels
123+
124+
A system of user-defined repository labels (similar to atom labels) was considered as an alternative to initialization commits. This would involve adding a `label` field to `ekala.toml` and incorporating it into repository identity calculations.
125+
126+
**Why Rejected:**
127+
128+
- **Collection vs Component**: Atoms are individual components that benefit from human-readable names for coordination. Repositories are collections of components where temporal identity provides clearer provenance tracking and avoids naming conflicts in a decentralized system.
129+
- **Maintenance Complexity**: User-managed labels require deprecation mechanisms for renames, adding complexity that temporal identity avoids through immutable Git commits.
130+
- **Coordination Overhead**: Labels create social coordination challenges (name conflicts, ownership disputes) that temporal identity sidesteps by using cryptographic time-based identity instead of human names.
131+
- **Decentralized Constraints**: Without central registries, label-based coordination becomes impractical at scale, while temporal identity works naturally in distributed environments.
108132

109-
### 5. Logical Grouping: Tags over Formal Sets
133+
#### Logical Grouping: Tags over Formal Sets
110134

111135
A rigid, filesystem-based `set` hierarchy was considered as a mechanism for grouping atoms. This approach was **rejected** because it conflates **physical layout** with **logical grouping**, is inflexible (an atom can only belong to one set), and does not work across repository boundaries.
112136

@@ -116,11 +140,14 @@ The chosen `tags` system is superior because it is a **metadata-driven** approac
116140

117141
**Pros**:
118142

119-
- **Unambiguous Identity**: The `project.label` in the root `ekala.toml`, combined with the root commit hash and atom `label`, provides a robust, fork-safe identity for all atoms.
143+
- **Robust Identity**: The initialization commit system provides mathematically strong provenance with temporal anchoring, enabling precise fork analysis and preventing publication of atoms created before explicit Ekala initialization.
144+
- **Simplified Evolution**: Repository identity changes require clean reinitialization rather than complex deprecation management, providing clearer lifecycle semantics.
145+
- **Cryptographic Strength**: Entropy injection in init commits ensures collision resistance while leveraging Git's immutability for free.
120146
- **Clear Terminology**: Deprecating `tag` in favor of `label` for the unique identifier resolves a major point of confusion.
121147
- **Performant Discovery**: Both local discovery (reading the root `ekala.toml`) and remote discovery (querying for manifest refs) are extremely fast and avoid filesystem traversals or repository clones.
122148
- **Clear and Formalized**: The architecture is now based on a clear set of rules, with a single source of truth and precise terminology.
123149
- **Flexible Grouping**: The metadata-driven `tags` system allows for flexible, multi-faceted grouping of atoms, which is not possible with a rigid, filesystem-based hierarchy.
150+
- **Rich Metadata**: The dual tagging system (tags + key-value metadata) enables both simple categorization and structured queries, supporting advanced decentralized discovery through systems like Eos.
124151

125152
**Cons**:
126153

0 commit comments

Comments
 (0)