-
Notifications
You must be signed in to change notification settings - Fork 17
Expand file tree
/
Copy pathTODO
More file actions
225 lines (189 loc) · 10.6 KB
/
TODO
File metadata and controls
225 lines (189 loc) · 10.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# Papyri TODO
Catch-all for bugs found in the wild, planning notes, and the ops /
lint queue. Rename passes live in [`TODO-renames.md`](TODO-renames.md);
milestones in [`PLAN.md`](PLAN.md).
## Open code smells
- `crosslink.py` version resolution across packages. Three related
prose `TODO:` markers (lines 172, 380, 393, 415):
- the main `.process` loop has a stale `assert len(visitor.local)
== 0` that would need a gen-time pass over local refs before it
can be re-enabled (`:172`).
- per-reference version information is still needed to support
cross-package linking correctly (`:393`).
- `doc_blob.all_forward_refs()` on Figures: Figure's
`RefInfo.version` is populated at walk time today; proper fix
populates it at serialisation time (`:415`).
Anyone touching cross-package linking should read them first.
`PLAN.md` "Follow-ups" echoes this item as well.
- `tree.py` still has ~8 `TODO:` markers inside the directive /
resolve code paths. Triage as a batch when the resolver is
touched; not blocking day-to-day work.
## Next up
### Walk LocalRefs when building the forward-ref graph
Today both `IngestedDoc.all_forward_refs` (`papyri/crosslink.py:110-126`)
and the TS `collectForwardRefs` (`ingest/src/visitor.ts`) walk only
`RefInfo` (kind != "local") and `Figure` nodes. Intra-bundle CrossRefs
are stored with `LocalRef.reference` instead of `RefInfo` (gen-time
choice in `papyri/tree.py:GenVisitor._ref_to_crossref`, `974`-`976`),
so they produce no graph edge and the viewer's "Referenced by" list
never shows same-bundle callers — e.g. `papyri.examples` does not
appear in the back-refs of `papyri.examples:example1`.
The `LocalRef` is intentional for the *gen* output: it keeps a
bundle's content digest independent of its own version stamp. But
the *ingest* graph table is post-resolution; an edge keyed on
`(ownPkg, ownVer, kind, path)` is fine there. So the visitor can
walk `LocalRef` and emit a `Key` by attaching the bundle's
`(module, version)` from the caller's context.
Concrete change:
1. `collectForwardRefs(doc, *, ownPkg, ownVer)` adds `"LocalRef"` to
`FORWARD_REF_TYPES`; for each `LocalRef` node, emit
`Key(ownPkg, ownVer, ref.kind, ref.path)`. Same for
`collectForwardRefsFromSection`.
2. `Ingester._ingestApiDocs` / `_ingestNarrativeDocs` /
`_ingestExamples` already have `(root, version)` in scope; thread
them in.
3. `IngestedDoc.all_forward_refs(self, *, package, version)` mirrors
the change. Callers in `crosslink.py:419` and `:491` pass
`key.module, key.version`.
4. Update the unit pin in `ingest/tests/visitor.test.ts`
("skips LocalRef-bearing CrossRefs") to assert the *new* shape:
one edge keyed `(ownPkg, ownVer, ref.kind, ref.path)` per
LocalRef, with the test exercising both the call-site context
threading and the kind="local" exclusion.
Out of scope: changing the gen-time storage shape. LocalRef stays as
LocalRef in the IR; only the visitor's projection to graph edges
changes. Existing bundles need to be re-ingested for the edges to
materialise (no schema migration; `gstore.put` rewrites the row's
forward set when it differs).
### Split `LocalRef` into IntraDocument / IntraBundle ref types
Sibling consideration to the LocalRef walk above. Right now `LocalRef`
covers two semantically distinct cases that the visitor wants to
treat differently:
| Case | Source | Visitor action |
| ---------------- | ------------------------------------- | ----------------------------------------------- |
| Intra-document | `papyri/tree.py:962` (`add_target`) | Render as `<a href="#anchor">`; *no* graph edge |
| Intra-bundle | `papyri/tree.py:975` (`_ref_to_crossref`) | Emit graph edge with bundle's `(pkg, ver)` (per the LocalRef-walk task above) |
| Inter-bundle | `RefInfo` | Emit graph edge as-is |
Today both intra cases are tagged `LocalRef(kind, path)` and
distinguished only by `kind` ("docs" vs "module"), but a same-bundle
*doc-to-doc* ref also has `kind="docs"` and is structurally
indistinguishable from a same-page section anchor. Adding the
LocalRef-walk above without splitting the type means the visitor
either over-emits edges (anchors become graph edges to nowhere) or
under-emits (real intra-bundle docs refs get dropped).
Sketch of the three-type taxonomy:
- `AnchorRef(target: str)` — same-page anchor; never an edge. New CBOR
tag (probably 4023 — currently free between 4022 LocalRef and 4024
Figure).
- `LocalRef(kind, path)` — intra-bundle reference. Visitor emits a
Key with the caller's `(pkg, ver)`. (Same shape as today; cleaned
semantics.)
- `RefInfo(module, version, kind, path)` — inter-bundle reference.
Unchanged.
Practical consequences if this lands:
1. New CBOR tag → existing bundles need re-gen (not just re-ingest).
Coordinate with the LocalRef-walk task: do this *first* so the
walk lands against the cleaned types.
2. `tree.py:962` switches to `AnchorRef(node.target)`; `tree.py:975`
stays on `LocalRef`.
3. Viewer code that resolves LocalRef → `<a href="...">` splits into
two paths: AnchorRef → `<a href="#target">`; LocalRef → page URL.
4. `IngestVisitor.replace_CrossRef` (`papyri/tree.py:987`) and the TS
equivalent split the LocalRef branch in two.
Tracked here so the LocalRef-walk task doesn't ossify the current
overload.
## Known docstring bugs (recorded pre-Phase 1 — verify before filing)
- `IPython.lib.display.audio`: check parsing of bullet list in parameters.
- `numpy.core.shape_base._concatenate_shapes`: may not be complete /
improperly parsed.
- `numpy.errstate`: fields alternate between `numpy._ErrDict` and
`numpy.dict`.
- `numpy.histogram_bin_edges`: block math in
`_content.Notes.children.2.children.0.dd` and math not rendered.
- `scipy.sparse.data._create_method.<locals>.method`: likely gets a
conflicting canonical name with another function (`tan`/`float`).
- `dask.delayed.delayed`: multi-paragraph in one parameter.
- `IPython.core.display.Video.__init__`: block Verbatim in params?
- `IPython.core.interactiveshell.InteractiveShell.complete`: DefListItem.
- `IPython.core.completer.Completion`: item list.
- `matplotlib.transforms.Bbox`: parsing of example is completely incorrect.
- `matplotlib.axes._axes.Axes.text`: misparses example as well.
- `matplotlib.figure.Figure.add_subplot`: custom double-dot example.
- `matplotlib.colors`: unnumbered list with indent; reference via
`.. _palettable: value` and autolink `paletable_`.
- `numpy.npv`: has warning sections.
- `scipy.signal.ltisys.bode`: multiple figures.
- `scipy.signal.barthann`: multiple figures.
## Papyri bugs (from `papyri gen` on IPython 9.12.0 + numpy 2.4.4, Py 3.14)
- **`AssertionError` in `papyri/ts.py:579`**
(`assert len(post_text) >= len(self.as_text(tc))`). Fires on numpy
dtype classes whose docstring starts with a short signature line
followed by a `--` underline shorter than the title, which
tree-sitter-rst parses as a section header. Affects ~25 entries:
`numpy:dtype`, `numpy.fft:fft`, and all of `numpy.dtypes:*DType`
(`BoolDType`, `Int8/16/32/64DType`, `UInt8/16/32/64DType`,
`LongLongDType`, `ULongLongDType`, `BytesDType`, `StrDType`,
`StringDType`, `CLongDoubleDType`, `Complex64/128DType`,
`Float16/32/64DType`, `DateTime64DType`, `TimeDelta64DType`,
`ObjectDType`, `VoidDType`, `LongDoubleDType`). Either loosen the
assert to a soft-fail / log, or detect the "too-short adornment" case
earlier and treat the first line as a signature instead of a title.
## Upstream docstring issues worth reporting
- `numpy.random._mt19937:MT19937._legacy_seeding` has a malformed
backtick sequence in its docstring:
`` `SeedSequence, or ``None` ``
The backticks are unbalanced. Intended:
`` `SeedSequence`, or ``None`` ``
Lives in `_mt19937.pyx` in the numpy repo. Triggers the
"Improper backtick" warning in papyri's tree-sitter RST parser
three times.
- `numpy.polynomial._polybase:ABCPolyBase` uses a `Class Attributes`
section name that numpydoc (and therefore papyri's
`numpydoc_compat.NumpyDocString._guess_header`) does not recognise.
Either add `Class Attributes` to papyri's alias table or rename the
section in numpy to one of the canonical numpydoc section names.
- Non-signature first line in docstrings for
`IPython.utils.timing:{clock2,timings,timings_out}` is not a real
Python signature — it uses tuple-literal return annotations like
`-> (t_user, t_system)`. Fine as human-readable text but the names
break papyri's exec-based signature parser on 3.14. Not really an
IPython bug; the right fix is on papyri's side (see PEP 649 bullet
above), but if IPython wanted to be friendly it could drop the first
"synopsis" line now that real annotations exist on these functions.
## Linting / stability tooling (branch: `claude/add-linting-tools-WkE1K`)
Already landed (for context — not actionable):
- Ruff `select = ["E", "F", "W", "B", "I", "UP"]` in `pyproject.toml`.
- Mypy tightening: `warn_unused_ignores`, `warn_redundant_casts`,
`no_implicit_optional`, `check_untyped_defs`.
- Per-workflow `permissions:` block on all four workflows (least
privilege).
- Dependabot config at `.github/dependabot.yml`.
- `zizmor` workflow at `.github/workflows/zizmor.yml`.
### Ruff rule expansion
- [x] Extend `[tool.ruff.lint] select` with `SIM` (simplify), `RUF`
(ruff-native), `PERF`. Consider `PIE`, `PL` (pylint subset),
`RET` once those settle. Extra type checkers (`ty`, `pyrefly`)
remain intentionally excluded.
- [x] Land autofixes from the new rules as a separate commit so review
stays tractable.
### Test stability
- [ ] Add `pytest-xdist` for parallel test execution.
- [ ] Add `pytest-randomly` to shuffle test order (guards against the
kind of import-order bug `take2` ↔ `myst_ast` used to have).
- [ ] Add `pytest-timeout` with a global timeout so `gen`-adjacent
tests cannot hang CI.
- [ ] Add a golden-file test: `papyri gen examples/papyri.toml
--no-infer` then assert the expected files exist under
`~/.papyri/data/papyri_<ver>/` and decode as CBOR.
- [ ] Add a `papyri describe` smoke test against the self-generated
bundle.
### Security / supply chain
- [ ] Add `pip-audit` step to CI for CVE scanning of runtime deps.
- [ ] Add `bandit` scan scoped to `papyri/gen.py` (which execs
examples); accept that it will be noisy and tune rules.
### Pre-commit
- [ ] Add `.pre-commit-config.yaml`: `ruff check --fix`, `ruff
format`, `mypy`, `check-added-large-files`.
### CI hygiene
- [ ] Add `concurrency:` with `cancel-in-progress: true` to PR
workflows (none of the four workflows set it today).