Skip to content

Fix Anthropic temperature for newer Claude models#2490

Open
johnjosephhorton wants to merge 5 commits into
mainfrom
fix/anthropic-temperature-deprecation-2458
Open

Fix Anthropic temperature for newer Claude models#2490
johnjosephhorton wants to merge 5 commits into
mainfrom
fix/anthropic-temperature-deprecation-2458

Conversation

@johnjosephhorton

Copy link
Copy Markdown
Collaborator

Summary

  • Normalize outbound Anthropic API temperature to 1.0 for Claude models released after Opus 4.6
  • Preserve the user-configured Model.temperature and serialized parameters for EDSL/cache behavior
  • Add Anthropic service tests covering affected and legacy model names

Closes #2458

Testing

  • pytest -q tests/inference_services/test_anthropic_service.py
  • pytest -q tests/inference_services

Before production

  • Test this with local Anthropic inference using a newer affected Claude model and a legacy Claude model before moving to production.

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR primarily adds temperature normalization for newer Anthropic Claude models (post-Opus 4.6), intercepting the outbound API call to force temperature=1.0 while preserving the user-configured value for EDSL serialization and caching. It also bundles several unrelated changes: the default object store location moves from a stable platformdirs path to CWD-relative .edsl_objects/, the typer version constraint is widened, and jobs.py now raises JobsRunError instead of silently falling back to local execution when remote inference is unavailable.

  • Temperature fix (anthropic_service.py): _requires_temperature_one uses an 8-digit date check (priority) plus a family/version check to decide whether to override temperature with 1.0 in the API payload; model.temperature and model.parameters[\"temperature\"] are left unchanged.
  • Jobs remote-inference guard (jobs.py): _remote_results now raises JobsRunError when use_remote_inference() returns False and disable_remote_inference is not set — changing the default behavior for users without EP credentials from a silent local fallback to an error.
  • Object store root (store.py): Default storage moves from platformdirs.user_data_dir(\"edsl\")/objects to Path.cwd()/\".edsl_objects\", making the store project-scoped but no longer persistent across different working directories.

Confidence Score: 3/5

The temperature fix itself is correct and well-tested, but the bundled jobs.py change silently removes the local-execution fallback that users without EP credentials currently rely on.

The jobs.py change removes the silent local fallback when remote inference is unavailable. Because offload_execution defaults to True and disable_remote_inference defaults to False, any user calling job.run() without EP remote inference configured will now get a JobsRunError where they previously ran locally. The docstring for run() still promises local fallback, making the behavior change easy to miss. The temperature normalization logic in anthropic_service.py is correct and the accompanying tests are thorough, but the test file is missing coverage of undated model strings that exercise the version-based branch directly.

edsl/jobs/jobs.py — the removal of the silent local fallback in _remote_results needs explicit verification that the default user journey (no EP credentials, no flags) is handled gracefully.

Important Files Changed

Filename Overview
edsl/inference_services/services/anthropic_service.py Adds _requires_temperature_one and _api_temperature class methods to normalize temperature to 1.0 for newer Claude models; logic is sound but the date-first branch can override the opus-4-6 version exemption for dated model names released after the cutoff.
edsl/jobs/jobs.py Changes _remote_results to raise JobsRunError when remote inference is unavailable and disable_remote_inference is not set — a breaking behavioral change for users without EP credentials who previously fell back silently to local execution.
tests/inference_services/test_anthropic_service.py New test file covering temperature normalization for affected and legacy models; misses versionless model-name cases (e.g. claude-opus-4-6, claude-sonnet-4-6) that exercise the version-based branch independently of date matching.
tests/jobs/test_Jobs.py Adds two tests explicitly validating the new _remote_results behavior: error on unavailable remote with offload_execution=True, and successful local execution with disable_remote_inference=True.
edsl/object_store/store.py Replaces stable platformdirs-based DEFAULT_ROOT class attribute with a CWD-relative default_root() static method; changes default object store location from a persistent OS-standard path to .edsl_objects/ in the current working directory.

Sequence Diagram

sequenceDiagram
    participant User
    participant Jobs
    participant RemoteHandler
    participant AnthropicService

    User->>Jobs: job.run()
    Jobs->>RemoteHandler: use_remote_inference(disable_remote_inference)
    alt remote available
        RemoteHandler-->>Jobs: True
        Jobs->>Jobs: run remotely
        Jobs-->>User: Results
    else remote unavailable
        RemoteHandler-->>Jobs: False
        alt "disable_remote_inference=True"
            Jobs-->>User: (None, None) local execution
        else "disable_remote_inference=False default"
            Jobs-->>User: raises JobsRunError
        end
    end

    User->>AnthropicService: async_execute_model_call(prompt)
    AnthropicService->>AnthropicService: _requires_temperature_one(model_name)
    alt model after Opus 4.6 cutoff
        AnthropicService->>AnthropicService: _api_temperature returns 1.0
    else legacy model
        AnthropicService->>AnthropicService: _api_temperature returns user temperature
    end
    AnthropicService->>AnthropicService: "client.messages.create(temperature=api_temp)"
    AnthropicService-->>User: response model.temperature preserved
Loading

Reviews (1): Last reviewed commit: "Fix Anthropic temperature for newer mode..." | Re-trigger Greptile

Comment thread edsl/jobs/jobs.py
Comment on lines 1081 to +1092
else:
return None, None
if self.run_config.parameters.disable_remote_inference:
return None, None

from .exceptions import JobsRunError

raise JobsRunError(
"Remote execution was requested, but remote inference is not "
"available. Check EXPECTED_PARROT_URL, EXPECTED_PARROT_API_KEY, "
"and the remote inference setting. To run locally, pass "
"disable_remote_inference=True or offload_execution=False."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent local-execution fallback is now removed by default

Previously, when use_remote_inference() returned False (e.g. no EP API key, connection error, or user's remote-inference setting disabled), the code returned (None, None) and the caller fell through to local execution. Now a JobsRunError is raised for the same condition unless disable_remote_inference=True is explicitly set. Because offload_execution defaults to True (line 111 of data_structures.py) and disable_remote_inference defaults to False, any user who calls job.run() without EP credentials will now get an error instead of running locally. The existing docstring in run() still says "If remote inference is not available, it will run locally", which is now incorrect.

Comment on lines +6 to +14
def test_requires_temperature_one_for_models_after_opus_46():
assert AnthropicService._requires_temperature_one(
"claude-sonnet-4-6-20260217"
)
assert AnthropicService._requires_temperature_one("claude-opus-4-7-20260416")
assert AnthropicService._requires_temperature_one("claude-opus-4-7")
assert not AnthropicService._requires_temperature_one("claude-opus-4-6-20260205")
assert not AnthropicService._requires_temperature_one("claude-opus-4-5-20251124")
assert not AnthropicService._requires_temperature_one("claude-3-5-sonnet-20241022")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing test cases for versionless model names

The test suite validates dated model strings (e.g. claude-opus-4-6-20260205) but omits the undated variants that users commonly pass. For example, claude-opus-4-6 (no date) should return False and claude-sonnet-4-6 (no date) should return True per the version-based branch — but neither is exercised. Without these tests, a regression in the else branch of _requires_temperature_one (the path reached when no 8-digit date is present) would go undetected.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +42 to +65
@classmethod
def _requires_temperature_one(cls, model_name: str) -> bool:
"""Return whether Anthropic only accepts temperature=1.0 for this model."""
model_name = model_name.lower()
date_match = re.search(r"(?<!\d)(\d{8})(?!\d)", model_name)
if date_match:
return int(date_match.group(1)) > cls._temperature_deprecation_date

version_match = re.search(
r"claude-(?P<family>opus|sonnet|haiku)-(?P<major>\d+)-(?P<minor>\d+)",
model_name,
)
if not version_match:
return False

version = (
int(version_match.group("major")),
int(version_match.group("minor")),
)
if version > cls._temperature_deprecation_version:
return True

family = version_match.group("family")
return version == cls._temperature_deprecation_version and family != "opus"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Date-first check can override version-family exemption for opus-4-6

When a model name includes an 8-digit date, the date check short-circuits before the family/version logic. This means a hypothetical claude-opus-4-6-20260206 (any opus-4-6 snapshot released the day after the cutoff) would return True via the date path, contradicting the version-based branch that explicitly exempts opus-4-6 regardless of date. In practice this edge case is unlikely, but the two branches can yield opposite answers for the same logical model depending solely on whether a date suffix is present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Anthropic model temperature values

1 participant