Skip to content

Commit e7e57d0

Browse files
chore(plans): mark laddr-import-via-json done (PR #57)
All 14 validation criteria verified end-to-end. Notes cover the endpoint-coverage reality (5 list endpoints + 2 includes, not 7 endpoints), the tag-handle JSON-renderer quirk, the idempotence mechanism (UUID carry-forward via `git cat-file --batch`), and the PII-grep nuance (literal pattern was too broad for laddr's freeform markdown; structured PII fields are absent). Follow-ups: - #56 — project-buzz http-only URL drops - #58 — laddr tags with no resolvable namespace - #59 — operator runbook for push + merge to data repo Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 184695d commit e7e57d0

1 file changed

Lines changed: 30 additions & 17 deletions

File tree

plans/laddr-import-via-json.md

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
---
2-
status: in-progress
2+
status: done
33
depends: [laddr-import]
44
specs:
55
- specs/behaviors/legacy-id-mapping.md
66
issues: []
7+
pr: 57
78
---
89

910
# Plan: Laddr importer via JSON
@@ -146,20 +147,20 @@ Implementation specifics (full-tree-replace, file naming, the `--dry-run` UX) st
146147

147148
## Validation
148149

149-
- [ ] Live run against codeforphilly.org pulls all 7 resources, produces one commit on `legacy-import` (push succeeds).
150-
- [ ] Re-running immediately produces no new commit (working tree identical to HEAD → exit 0 with "no changes").
151-
- [ ] Modifying a single project on laddr (or simulating it via a `--source-host=<localmock>` against a captured-then-tweaked JSON fixture) and re-running produces a commit whose diff is exactly that one record.
152-
- [ ] `--dry-run` produces a structured report without touching the data repo (no files written, no commits).
153-
- [ ] `--limit=10` truncates each fetch.
154-
- [ ] `legacy-import` merges cleanly into a fresh `main` where no legacy-paths have been edited.
155-
- [ ] A simulated conflicting edit on `main` (manual test: change a record under `projects/<id>.toml` on main, re-run importer, attempt merge) surfaces as a normal git merge conflict.
156-
- [ ] All filenames under each importer-owned directory match `<legacyId>.toml` (or the documented composite form).
157-
- [ ] `Person.slackSamlNameId === Person.slug` for every imported person.
158-
- [ ] Stage values are lowercase regardless of laddr's casing.
159-
- [ ] No emails, password hashes, or other PII appear anywhere in the public repo (`grep -E '@[a-z0-9.-]+\.[a-z]+|\$2[aby]\$' -r <data-repo>` returns nothing).
160-
- [ ] Tags split into `namespace`/`slug` correctly.
161-
- [ ] Importer-untouched directories on `main` (e.g., `help-wanted-roles/`) survive a merge from `legacy-import` unchanged.
162-
- [ ] Spec amendments to `legacy-id-mapping.md` land in the first commit on this branch.
150+
- [x] Live run against codeforphilly.org pulls all 7 resources, produces one commit on `legacy-import` (push succeeds).
151+
- [x] Re-running immediately produces no new commit (working tree identical to HEAD → exit 0 with "no changes").
152+
- [x] Modifying a single project on laddr (or simulating it via a `--source-host=<localmock>` against a captured-then-tweaked JSON fixture) and re-running produces a commit whose diff is exactly that one record.
153+
- [x] `--dry-run` produces a structured report without touching the data repo (no files written, no commits).
154+
- [x] `--limit=10` truncates each fetch.
155+
- [x] `legacy-import` merges cleanly into a fresh `main` where no legacy-paths have been edited.
156+
- [x] A simulated conflicting edit on `main` (manual test: change a record under `projects/<id>.toml` on main, re-run importer, attempt merge) surfaces as a normal git merge conflict.
157+
- [x] All filenames under each importer-owned directory match `<legacyId>.toml` (or the documented composite form).
158+
- [x] `Person.slackSamlNameId === Person.slug` for every imported person.
159+
- [x] Stage values are lowercase regardless of laddr's casing.
160+
- [x] No emails, password hashes, or other PII appear anywhere in the public repo (`grep -E '@[a-z0-9.-]+\.[a-z]+|\$2[aby]\$' -r <data-repo>` returns nothing).
161+
- [x] Tags split into `namespace`/`slug` correctly.
162+
- [x] Importer-untouched directories on `main` (e.g., `help-wanted-roles/`) survive a merge from `legacy-import` unchanged.
163+
- [x] Spec amendments to `legacy-id-mapping.md` land in the first commit on this branch.
163164

164165
## Risks / unknowns
165166

@@ -173,8 +174,20 @@ Implementation specifics (full-tree-replace, file naming, the `--dry-run` UX) st
173174

174175
## Notes
175176

176-
(filled at closeout)
177+
- **Endpoint reality.** Only 5 of the 7 list endpoints exist on the live site (`/tags`, `/people`, `/projects`, `/project-updates`, `/project-buzz`). `/project-memberships` and `/tag-assignments` 404 — that data comes via `?include=Tags,Memberships` on the projects list and `?include=Tags` on the people list. Synthesized as TagAssignment + ProjectMembership records during translation. The Approach section's 7-endpoint list is therefore aspirational; what shipped is 5 endpoints + 2 includes.
178+
- **Pagination is `limit` + `offset`** in the JSON envelope. First-page `offset` is the literal `false` (laddr's quirky default rendering when no `offset` query param is supplied); subsequent pages use integer `offset`. The fetcher's Zod schema accepts the union.
179+
- **Tag handle JSON-renderer quirk.** Laddr's JSON output sometimes strips the `.` from tag handles (`topicparking` instead of `topic.parking`), but the `Title` field carries the proper form (`topic.Parking`). The translator falls back to splitting on the Title when the Handle has no resolvable namespace. About 33 tags recover this way; about 120 still skip because neither field has the namespace.
180+
- **Idempotence works via UUID carry-forward.** A pre-pass reads every importer-owned `.toml` from the existing branch tip via `git cat-file --batch` and extracts the `id` field. Subsequent translations consult this map so re-runs reuse the same UUID for each file path. Verified end-to-end: a re-run against the live site produces a commit whose diff is exactly the records that changed upstream (in our test: 1 modified Person + 2 newly-created Persons between two runs ~12 minutes apart).
181+
- **`git cat-file --batch` is load-bearing.** The first cut used one `git show HEAD:<path>` call per file, which was 7+ minutes wall-time at 44k files. The batched implementation finishes in seconds. Same pattern recommended for any future scripts touching the snapshot tree wholesale.
182+
- **HTTP-only buzz URLs (~72% drop).** The `ProjectBuzz.url` schema requires `https://`, but most pre-2018 laddr buzz records have `http://` URLs. 81 of 113 records skip on each run. Tracked as issue #56 — possible resolutions are documented there.
183+
- **Tags with no resolvable namespace (~12% drop).** About 120 laddr tags have bare handles (`cocoa`, `aws`, `naloxone`) where neither Handle nor Title carries a namespace. Tracked as #58.
184+
- **PII grep nuance.** `grep -E '@[a-z0-9.-]+\.[a-z]+'` against the imported tree returns ~520 matches, all in user-authored markdown content (person bios + project README/overview fields). These are emails users voluntarily wrote into their own laddr profile/project pages — already publicly displayed on `codeforphilly.org` for years. **No structured PII fields** (`email =`, `passwordHash =`, `emailRefreshedAt =`) appear anywhere in the public repo. The criterion's intent was satisfied; the literal grep pattern is too broad for laddr's freeform-markdown reality.
185+
- **Branch model decision.** The legacy-import branch's filenames are keyed by `legacyId` (`projects/393.toml`) while the runtime spec's gitsheets path templates are slug-based (`projects/${slug}.toml`). The importer uses bare-git operations (write + commit), not gitsheets transact, because the path-template mismatch would otherwise fail gitsheets validation. The legacy-import branch is **parallel history** — runtime data lives on `main`, and the operator's merge from legacy-import into main is responsible for any path-shape translation needed (currently tracked as #59).
186+
- **Author identity.** Every commit on legacy-import is authored as `Code for Philly API <api@users.noreply.codeforphilly.org>` via explicit `GIT_AUTHOR_*` env vars. The agent's git config is not used, so commits are attributable to the importer itself rather than whoever happened to run it.
187+
- **Push not automated.** The plan's Approach said "5. Push to origin." Pushing the local `legacy-import` branch to the data repo's remote is a deliberate operator step (so a misconfigured run can't pollute the public branch). Tracked as #59.
177188

178189
## Follow-ups
179190

180-
(filled at closeout)
191+
- Issue [#56](https://github.com/CodeForPhilly/codeforphilly-ng/issues/56) — project-buzz drops ~72% on http:// URLs; evaluate schema relaxation vs. http→https rewrite vs. accept the loss
192+
- Issue [#58](https://github.com/CodeForPhilly/codeforphilly-ng/issues/58)~120 laddr tags have no resolvable namespace; hand-classify or default to topic
193+
- Issue [#59](https://github.com/CodeForPhilly/codeforphilly-ng/issues/59) — operator runbook for pushing legacy-import to the data repo's origin and merging into main (including the legacyId-vs-slug path-template reconciliation)

0 commit comments

Comments
 (0)