Skip to content

v3 corpus expansion: 29 unverified candidates from May 17 wide scan #138

@wallscaler

Description

@wallscaler

Context

The 13-row canonical v3 corpus (operator-only, ~/Documents/TOOLS/.credentials/cathedral-corpus/rows_v3_canonical.json) was promoted on 2026-05-18 from ~/Documents/INBOX/cathedral-corpus-staging/rows_verified.json. The wider 35-candidate scan from the same day at ~/cathedral-corpus-staging/all_candidates.json had 6 candidates that match canonical rows by source_url (already promoted under different IDs) and 29 candidates that have not been verified or promoted. This issue tracks the unverified 29 so they can be promoted later without re-scanning.

How to promote a candidate from this list

  1. Open the source PR, confirm the diff is still tight and the parent-of-fix SHA is correct.
  2. Build a ChallengeRow matching src/cathedral/v3/corpus/schema.py: paraphrase issue_text, set 3-5 required_failure_keywords drawn from terms in the actual fix diff, pick difficulty and bucket.
  3. Append to ~/Documents/TOOLS/.credentials/cathedral-corpus/rows_v3_canonical.json and revalidate with the loader.
  4. Tick the candidate off below.

29 unverified candidates

ID Repo Source PR
v3_pilot_pydantic_url_constraints_walrus pydantic/pydantic pydantic/pydantic#12826
v3_pilot_pydantic_model_construct_private_attrs pydantic/pydantic pydantic/pydantic#12816
v3_pilot_pydantic_dataclass_field_kw_only_override pydantic/pydantic pydantic/pydantic#12741
v3_pilot_pydantic_namedtuple_subclass_annotations pydantic/pydantic pydantic/pydantic#12951
v3_pilot_django_orderby_issubset_groupby_crash django/django django/django#21121
v3_pilot_django_redirect_max_length_encoded django/django django/django#21276
v3_pilot_django_defer_fetch_peers_fk_typeerror django/django django/django#21110
v3_pilot_django_multipart_parser_lookup_error django/django django/django#20714
v3_pilot_django_asgi_script_prefix_boundary django/django django/django#20749
v3_pilot_flask_provide_automatic_options_enable pallets/flask pallets/flask#5917
v3_pilot_flask_test_client_context_push_order pallets/flask pallets/flask#5797
v3_pilot_flask_stream_with_context_async pallets/flask pallets/flask#5799
v3_pilot_fastapi_auth_header_strip fastapi/fastapi fastapi/fastapi#14786
v3_pilot_fastapi_remap_ref_missing_key fastapi/fastapi fastapi/fastapi#14361
v3_pilot_fastapi_replace_refs_attr_named_ref fastapi/fastapi fastapi/fastapi#14349
v3_pilot_fastapi_depends_func_scope_parameterless fastapi/fastapi fastapi/fastapi#14301
v3_pilot_requests_redirect_self_reference psf/requests psf/requests#7328
v3_pilot_requests_s3_leading_slashes psf/requests psf/requests#7315
v3_pilot_requests_content_type_malformed_param psf/requests psf/requests#7309
v3_pilot_requests_header_validity_regex_eol psf/requests psf/requests#7308
v3_pilot_requests_no_proxy_domain_boundary psf/requests psf/requests#7427
v3_pilot_urllib3_ambiguous_path_parse_url_crash urllib3/urllib3 urllib3/urllib3#4952
v3_pilot_urllib3_read_chunked_amt_zero_infinite_loop urllib3/urllib3 urllib3/urllib3#4974
v3_pilot_urllib3_read_chunked_decoder_leftover urllib3/urllib3 urllib3/urllib3#3736
v3_pilot_urllib3_poolmanager_int_retries_redirect urllib3/urllib3 urllib3/urllib3#3655
v3_pilot_pandas_categorical_map_defaultdict pandas-dev/pandas pandas-dev/pandas#65413
v3_pilot_pandas_groupby_apply_asindex_empty pandas-dev/pandas pandas-dev/pandas#65332
v3_pilot_pandas_duplicated_empty_loses_index pandas-dev/pandas pandas-dev/pandas#65217
v3_pilot_pandas_date_range_periods1_offset pandas-dev/pandas pandas-dev/pandas#65011

Full candidate metadata (culprit file, line range, draft keywords, scanner notes) is in ~/Documents/INBOX/cathedral-corpus-staging/leftover-22.json (the file is misnamed because the original estimate was 22 before URL dedupe found 29 unmatched).

Priorities (suggested)

  • 3 hard-difficulty rows for honest difficulty spread (canonical is 4 easy + 9 medium + 0 hard). Best hard candidates: django concurrency / migration-edge, pandas groupby-with-NA edges, requests session-state under retry.
  • Round out pydantic / flask / django coverage (canonical has only 1 verified row each).

Out of scope

  • The 13 already-canonical rows. Live and validated.
  • Cross-language (pydantic-core Rust) rows. Held for a future multi-language batch.
  • v4 corpus. Separate model (synthetic, jailed, lives under CATHEDRAL_V4_CORPUS_PATH).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions