Fix Anthropic temperature for newer Claude models#2490
Fix Anthropic temperature for newer Claude models#2490johnjosephhorton wants to merge 5 commits into
Conversation
Greptile SummaryThis PR primarily adds temperature normalization for newer Anthropic Claude models (post-Opus 4.6), intercepting the outbound API call to force
Confidence Score: 3/5The temperature fix itself is correct and well-tested, but the bundled jobs.py change silently removes the local-execution fallback that users without EP credentials currently rely on. The jobs.py change removes the silent local fallback when remote inference is unavailable. Because offload_execution defaults to True and disable_remote_inference defaults to False, any user calling job.run() without EP remote inference configured will now get a JobsRunError where they previously ran locally. The docstring for run() still promises local fallback, making the behavior change easy to miss. The temperature normalization logic in anthropic_service.py is correct and the accompanying tests are thorough, but the test file is missing coverage of undated model strings that exercise the version-based branch directly. edsl/jobs/jobs.py — the removal of the silent local fallback in _remote_results needs explicit verification that the default user journey (no EP credentials, no flags) is handled gracefully. Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Jobs
participant RemoteHandler
participant AnthropicService
User->>Jobs: job.run()
Jobs->>RemoteHandler: use_remote_inference(disable_remote_inference)
alt remote available
RemoteHandler-->>Jobs: True
Jobs->>Jobs: run remotely
Jobs-->>User: Results
else remote unavailable
RemoteHandler-->>Jobs: False
alt "disable_remote_inference=True"
Jobs-->>User: (None, None) local execution
else "disable_remote_inference=False default"
Jobs-->>User: raises JobsRunError
end
end
User->>AnthropicService: async_execute_model_call(prompt)
AnthropicService->>AnthropicService: _requires_temperature_one(model_name)
alt model after Opus 4.6 cutoff
AnthropicService->>AnthropicService: _api_temperature returns 1.0
else legacy model
AnthropicService->>AnthropicService: _api_temperature returns user temperature
end
AnthropicService->>AnthropicService: "client.messages.create(temperature=api_temp)"
AnthropicService-->>User: response model.temperature preserved
Reviews (1): Last reviewed commit: "Fix Anthropic temperature for newer mode..." | Re-trigger Greptile |
| else: | ||
| return None, None | ||
| if self.run_config.parameters.disable_remote_inference: | ||
| return None, None | ||
|
|
||
| from .exceptions import JobsRunError | ||
|
|
||
| raise JobsRunError( | ||
| "Remote execution was requested, but remote inference is not " | ||
| "available. Check EXPECTED_PARROT_URL, EXPECTED_PARROT_API_KEY, " | ||
| "and the remote inference setting. To run locally, pass " | ||
| "disable_remote_inference=True or offload_execution=False." | ||
| ) |
There was a problem hiding this comment.
Silent local-execution fallback is now removed by default
Previously, when use_remote_inference() returned False (e.g. no EP API key, connection error, or user's remote-inference setting disabled), the code returned (None, None) and the caller fell through to local execution. Now a JobsRunError is raised for the same condition unless disable_remote_inference=True is explicitly set. Because offload_execution defaults to True (line 111 of data_structures.py) and disable_remote_inference defaults to False, any user who calls job.run() without EP credentials will now get an error instead of running locally. The existing docstring in run() still says "If remote inference is not available, it will run locally", which is now incorrect.
| def test_requires_temperature_one_for_models_after_opus_46(): | ||
| assert AnthropicService._requires_temperature_one( | ||
| "claude-sonnet-4-6-20260217" | ||
| ) | ||
| assert AnthropicService._requires_temperature_one("claude-opus-4-7-20260416") | ||
| assert AnthropicService._requires_temperature_one("claude-opus-4-7") | ||
| assert not AnthropicService._requires_temperature_one("claude-opus-4-6-20260205") | ||
| assert not AnthropicService._requires_temperature_one("claude-opus-4-5-20251124") | ||
| assert not AnthropicService._requires_temperature_one("claude-3-5-sonnet-20241022") |
There was a problem hiding this comment.
Missing test cases for versionless model names
The test suite validates dated model strings (e.g. claude-opus-4-6-20260205) but omits the undated variants that users commonly pass. For example, claude-opus-4-6 (no date) should return False and claude-sonnet-4-6 (no date) should return True per the version-based branch — but neither is exercised. Without these tests, a regression in the else branch of _requires_temperature_one (the path reached when no 8-digit date is present) would go undetected.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @classmethod | ||
| def _requires_temperature_one(cls, model_name: str) -> bool: | ||
| """Return whether Anthropic only accepts temperature=1.0 for this model.""" | ||
| model_name = model_name.lower() | ||
| date_match = re.search(r"(?<!\d)(\d{8})(?!\d)", model_name) | ||
| if date_match: | ||
| return int(date_match.group(1)) > cls._temperature_deprecation_date | ||
|
|
||
| version_match = re.search( | ||
| r"claude-(?P<family>opus|sonnet|haiku)-(?P<major>\d+)-(?P<minor>\d+)", | ||
| model_name, | ||
| ) | ||
| if not version_match: | ||
| return False | ||
|
|
||
| version = ( | ||
| int(version_match.group("major")), | ||
| int(version_match.group("minor")), | ||
| ) | ||
| if version > cls._temperature_deprecation_version: | ||
| return True | ||
|
|
||
| family = version_match.group("family") | ||
| return version == cls._temperature_deprecation_version and family != "opus" |
There was a problem hiding this comment.
Date-first check can override version-family exemption for opus-4-6
When a model name includes an 8-digit date, the date check short-circuits before the family/version logic. This means a hypothetical claude-opus-4-6-20260206 (any opus-4-6 snapshot released the day after the cutoff) would return True via the date path, contradicting the version-based branch that explicitly exempts opus-4-6 regardless of date. In practice this edge case is unlikely, but the two branches can yield opposite answers for the same logical model depending solely on whether a date suffix is present.
Summary
Closes #2458
Testing
Before production