Skip to content

Add Claude 4.x support and harden writeup pipeline#242

Open
SeoknamKo wants to merge 1 commit intoSakanaAI:mainfrom
SeoknamKo:fix/claude-4x-and-writeup-robustness
Open

Add Claude 4.x support and harden writeup pipeline#242
SeoknamKo wants to merge 1 commit intoSakanaAI:mainfrom
SeoknamKo:fix/claude-4x-and-writeup-robustness

Conversation

@SeoknamKo
Copy link
Copy Markdown

Summary

  • Register the Claude 4.x model family (claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001) in AVAILABLE_LLMS so they are accepted by the CLI. Routing already works through the existing startswith("claude-") dispatcher in create_client / get_response_from_llm.
  • Force English-only output during LaTeX writeup. When Claude produced Korean comments inside template.tex, Aider's SEARCH/REPLACE blocks failed to match on subsequent edits, triggering Only 3 reflections allowed, stopping and aborting the run. A new LANGUAGE_RULE is injected into the abstract / per-section / Related Work prompts, and a corresponding check is added to error_list.
  • Fix a latent bug in the writeup Coder.create() dispatcher: it had no claude-* branch, so Claude models were passed to Aider without the required anthropic/ prefix. Experiments dispatcher had the branch; writeup did not. Both now match.
  • Bump coder.max_reflections from Aider's default 3 to 6 after each Coder.create(), so transient SEARCH/REPLACE mismatches don't abort an otherwise healthy writeup.
  • Guard import torch with a try/except. launch_scientist.py only uses torch for GPU enumeration (torch.cuda.device_count()), but a broken torch install (common on Apple Silicon conda setups with mixed pip/conda torch) caused immediate ImportError at module load, even for sklearn-only templates. get_available_gpus() now returns [] when torch is missing or CUDA is unavailable.

Why

This surfaced while running AI Scientist against a custom sklearn-only template with claude-opus-4-7 on an M4 Mac:

  1. CLI rejected claude-opus-4-7 because it was not in AVAILABLE_LLMS.
  2. After registering the model, the writeup Coder mis-routed the Claude call because the anthropic/ prefix was only applied in the experiments dispatcher.
  3. Even after fixing the routing, writeup failed repeatedly: Claude emitted Korean LaTeX comments, Aider's SEARCH blocks included those comments verbatim, the target file did not contain them, diff edits failed, and Aider hit its 3-reflection cap.
  4. Separately, a broken torch install in the conda base env hard-blocked launch for templates that don't need torch at all.

Each change is small and isolated, and keeps the existing behavior for non-Claude / torch-available setups intact.

Test plan

  • python launch_scientist.py --model claude-opus-4-7 --experiment <sklearn_template> --writeup latex --num-ideas 1 completes through PDF generation on macOS M-series.
  • python launch_scientist.py --model claude-sonnet-4-6 ... same.
  • Existing gpt-4o / claude-3-5-sonnet-20241022 runs are unaffected (model dispatch unchanged for them).
  • import launch_scientist succeeds in an env where torch is broken; get_available_gpus() returns [].
  • Grep the resulting template.tex for any non-ASCII CJK ranges — should be empty.

🤖 Generated with Claude Code

- ai_scientist/llm.py: register claude-opus-4-7, claude-sonnet-4-6,
  claude-haiku-4-5-20251001 in AVAILABLE_LLMS so they pass argparse
  choices. Routing already works via the existing startswith("claude-")
  dispatcher in create_client / get_response_from_llm.

- ai_scientist/perform_writeup.py: inject a LANGUAGE_RULE into the
  abstract / section / Related Work prompts forcing English-only output,
  and extend error_list to flag non-English text. This prevents
  Aider's SEARCH/REPLACE blocks from failing to match when the LLM
  writes non-English LaTeX comments (observed with Claude producing
  Korean comments, breaking diff edits).

- launch_scientist.py:
  * Guard `import torch` so sklearn-only templates (e.g. custom
    materials / tabular experiments) keep working on environments
    with a broken or missing torch install (common on Apple Silicon
    conda base). get_available_gpus returns [] when torch is
    unavailable.
  * Add a `claude-*` branch to BOTH Coder.create() dispatchers
    (experiments AND writeup). The writeup dispatcher was missing
    the `anthropic/` model prefix, which caused Aider to mis-route
    Claude models during the writeup phase.
  * Bump `coder.max_reflections` to 6 (from Aider's default 3) after
    each Coder.create so transient SEARCH/REPLACE mismatches don't
    abort the run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant