Context
The 13-row canonical v3 corpus (operator-only, ~/Documents/TOOLS/.credentials/cathedral-corpus/rows_v3_canonical.json) was promoted on 2026-05-18 from ~/Documents/INBOX/cathedral-corpus-staging/rows_verified.json. The wider 35-candidate scan from the same day at ~/cathedral-corpus-staging/all_candidates.json had 6 candidates that match canonical rows by source_url (already promoted under different IDs) and 29 candidates that have not been verified or promoted. This issue tracks the unverified 29 so they can be promoted later without re-scanning.
How to promote a candidate from this list
- Open the source PR, confirm the diff is still tight and the parent-of-fix SHA is correct.
- Build a
ChallengeRow matching src/cathedral/v3/corpus/schema.py: paraphrase issue_text, set 3-5 required_failure_keywords drawn from terms in the actual fix diff, pick difficulty and bucket.
- Append to
~/Documents/TOOLS/.credentials/cathedral-corpus/rows_v3_canonical.json and revalidate with the loader.
- Tick the candidate off below.
29 unverified candidates
Full candidate metadata (culprit file, line range, draft keywords, scanner notes) is in ~/Documents/INBOX/cathedral-corpus-staging/leftover-22.json (the file is misnamed because the original estimate was 22 before URL dedupe found 29 unmatched).
Priorities (suggested)
- 3 hard-difficulty rows for honest difficulty spread (canonical is 4 easy + 9 medium + 0 hard). Best hard candidates: django concurrency / migration-edge, pandas groupby-with-NA edges, requests session-state under retry.
- Round out pydantic / flask / django coverage (canonical has only 1 verified row each).
Out of scope
- The 13 already-canonical rows. Live and validated.
- Cross-language (pydantic-core Rust) rows. Held for a future multi-language batch.
- v4 corpus. Separate model (synthetic, jailed, lives under
CATHEDRAL_V4_CORPUS_PATH).
Context
The 13-row canonical v3 corpus (operator-only,
~/Documents/TOOLS/.credentials/cathedral-corpus/rows_v3_canonical.json) was promoted on 2026-05-18 from~/Documents/INBOX/cathedral-corpus-staging/rows_verified.json. The wider 35-candidate scan from the same day at~/cathedral-corpus-staging/all_candidates.jsonhad 6 candidates that match canonical rows bysource_url(already promoted under different IDs) and 29 candidates that have not been verified or promoted. This issue tracks the unverified 29 so they can be promoted later without re-scanning.How to promote a candidate from this list
ChallengeRowmatchingsrc/cathedral/v3/corpus/schema.py: paraphraseissue_text, set 3-5required_failure_keywordsdrawn from terms in the actual fix diff, pickdifficultyandbucket.~/Documents/TOOLS/.credentials/cathedral-corpus/rows_v3_canonical.jsonand revalidate with the loader.29 unverified candidates
v3_pilot_pydantic_url_constraints_walrusv3_pilot_pydantic_model_construct_private_attrsv3_pilot_pydantic_dataclass_field_kw_only_overridev3_pilot_pydantic_namedtuple_subclass_annotationsv3_pilot_django_orderby_issubset_groupby_crashv3_pilot_django_redirect_max_length_encodedv3_pilot_django_defer_fetch_peers_fk_typeerrorv3_pilot_django_multipart_parser_lookup_errorv3_pilot_django_asgi_script_prefix_boundaryv3_pilot_flask_provide_automatic_options_enablev3_pilot_flask_test_client_context_push_orderv3_pilot_flask_stream_with_context_asyncv3_pilot_fastapi_auth_header_stripv3_pilot_fastapi_remap_ref_missing_keyv3_pilot_fastapi_replace_refs_attr_named_refv3_pilot_fastapi_depends_func_scope_parameterlessv3_pilot_requests_redirect_self_referencev3_pilot_requests_s3_leading_slashesv3_pilot_requests_content_type_malformed_paramv3_pilot_requests_header_validity_regex_eolv3_pilot_requests_no_proxy_domain_boundaryv3_pilot_urllib3_ambiguous_path_parse_url_crashv3_pilot_urllib3_read_chunked_amt_zero_infinite_loopv3_pilot_urllib3_read_chunked_decoder_leftoverv3_pilot_urllib3_poolmanager_int_retries_redirectv3_pilot_pandas_categorical_map_defaultdictv3_pilot_pandas_groupby_apply_asindex_emptyv3_pilot_pandas_duplicated_empty_loses_indexv3_pilot_pandas_date_range_periods1_offsetFull candidate metadata (culprit file, line range, draft keywords, scanner notes) is in
~/Documents/INBOX/cathedral-corpus-staging/leftover-22.json(the file is misnamed because the original estimate was 22 before URL dedupe found 29 unmatched).Priorities (suggested)
Out of scope
CATHEDRAL_V4_CORPUS_PATH).