Add Claude 4.x support and harden writeup pipeline by SeoknamKo · Pull Request #242 · SakanaAI/AI-Scientist

SeoknamKo · 2026-04-21T11:51:51Z

Summary

Register the Claude 4.x model family (claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001) in AVAILABLE_LLMS so they are accepted by the CLI. Routing already works through the existing startswith("claude-") dispatcher in create_client / get_response_from_llm.
Force English-only output during LaTeX writeup. When Claude produced Korean comments inside template.tex, Aider's SEARCH/REPLACE blocks failed to match on subsequent edits, triggering Only 3 reflections allowed, stopping and aborting the run. A new LANGUAGE_RULE is injected into the abstract / per-section / Related Work prompts, and a corresponding check is added to error_list.
Fix a latent bug in the writeup Coder.create() dispatcher: it had no claude-* branch, so Claude models were passed to Aider without the required anthropic/ prefix. Experiments dispatcher had the branch; writeup did not. Both now match.
Bump coder.max_reflections from Aider's default 3 to 6 after each Coder.create(), so transient SEARCH/REPLACE mismatches don't abort an otherwise healthy writeup.
Guard import torch with a try/except. launch_scientist.py only uses torch for GPU enumeration (torch.cuda.device_count()), but a broken torch install (common on Apple Silicon conda setups with mixed pip/conda torch) caused immediate ImportError at module load, even for sklearn-only templates. get_available_gpus() now returns [] when torch is missing or CUDA is unavailable.

Why

This surfaced while running AI Scientist against a custom sklearn-only template with claude-opus-4-7 on an M4 Mac:

CLI rejected claude-opus-4-7 because it was not in AVAILABLE_LLMS.
After registering the model, the writeup Coder mis-routed the Claude call because the anthropic/ prefix was only applied in the experiments dispatcher.
Even after fixing the routing, writeup failed repeatedly: Claude emitted Korean LaTeX comments, Aider's SEARCH blocks included those comments verbatim, the target file did not contain them, diff edits failed, and Aider hit its 3-reflection cap.
Separately, a broken torch install in the conda base env hard-blocked launch for templates that don't need torch at all.

Each change is small and isolated, and keeps the existing behavior for non-Claude / torch-available setups intact.

Test plan

python launch_scientist.py --model claude-opus-4-7 --experiment <sklearn_template> --writeup latex --num-ideas 1 completes through PDF generation on macOS M-series.
python launch_scientist.py --model claude-sonnet-4-6 ... same.
Existing gpt-4o / claude-3-5-sonnet-20241022 runs are unaffected (model dispatch unchanged for them).
import launch_scientist succeeds in an env where torch is broken; get_available_gpus() returns [].
Grep the resulting template.tex for any non-ASCII CJK ranges — should be empty.

🤖 Generated with Claude Code

- ai_scientist/llm.py: register claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001 in AVAILABLE_LLMS so they pass argparse choices. Routing already works via the existing startswith("claude-") dispatcher in create_client / get_response_from_llm. - ai_scientist/perform_writeup.py: inject a LANGUAGE_RULE into the abstract / section / Related Work prompts forcing English-only output, and extend error_list to flag non-English text. This prevents Aider's SEARCH/REPLACE blocks from failing to match when the LLM writes non-English LaTeX comments (observed with Claude producing Korean comments, breaking diff edits). - launch_scientist.py: * Guard `import torch` so sklearn-only templates (e.g. custom materials / tabular experiments) keep working on environments with a broken or missing torch install (common on Apple Silicon conda base). get_available_gpus returns [] when torch is unavailable. * Add a `claude-*` branch to BOTH Coder.create() dispatchers (experiments AND writeup). The writeup dispatcher was missing the `anthropic/` model prefix, which caused Aider to mis-route Claude models during the writeup phase. * Bump `coder.max_reflections` to 6 (from Aider's default 3) after each Coder.create so transient SEARCH/REPLACE mismatches don't abort the run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Claude 4.x support and harden writeup pipeline#242

Add Claude 4.x support and harden writeup pipeline#242
SeoknamKo wants to merge 1 commit intoSakanaAI:mainfrom
SeoknamKo:fix/claude-4x-and-writeup-robustness

SeoknamKo commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SeoknamKo commented Apr 21, 2026

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant