docs(power): clarify replay safety and step semantics by embano1 · Pull Request #5 · aws/aws-durable-execution-docs

embano1 · 2026-03-08T09:18:34Z

Update the durable-functions power guidance to distinguish deterministic orchestration code from non-atomic durable operation bodies.

generalize replay-safety guidance across steps, waits, and concurrent branches
document logger replay caveats and note that context.logger can wrap an existing logger
correct StepSemantics defaults and examples for TypeScript and Python
document the at-most-once-per-retry fallback for non-idempotent steps with retries disabled

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Update the durable-functions power guidance to distinguish deterministic orchestration code from non-atomic durable operation bodies. - generalize replay-safety guidance across steps, waits, and concurrent branches - document logger replay caveats and note that context.logger can wrap an existing logger - correct StepSemantics defaults and examples for TypeScript and Python - document the at-most-once-per-retry fallback for non-idempotent steps with retries disabled

embano1 · 2026-03-08T09:21:32Z

FYI, I also have another branch ready (based on these changes), which reduces the power size by ~50%.

embano1 · 2026-03-08T09:23:51Z

@singledigit can you use this updated power to verify if it would have caught the bugs in your scanner function: https://github.com/singledigit/durable-function-video-scanner/tree/main

Scanner fix: generate stable Transcribe/Rekognition submission identifiers once from durable state, not from wall-clock time inside the callback submitter. Default: derive them from scanId plus a deterministic suffix, or generate them in an earlier durable step and reuse them in the submitter.
Scanner fix: replace handler-level logger.* calls with context.logger (and childContext.logger / submitter ctx.logger where available) for all logging that can occur during replay.

aws-lambda-durable-functions-power/steering/getting-started.md

embano1 · 2026-03-28T08:54:46Z

@yaythomas can we prioritize this one to address some issues with the power?

yaythomas · 2026-03-08T10:03:57Z

aws-lambda-durable-functions-power/steering/replay-model-rules.md

+
+## Rule 2: Durable Operation Bodies Are Not Guaranteed To Be Atomic
+
+**Functions passed to durable context APIs must assume the operation is not guaranteed to be atomic with respect to external side effects, and may be re-attempted before the durable runtime has fully recorded the result.**


what about at most once guarantee?

what does "durable context APIs" mean? methods on the DurableContext? or the durable handler?

"Functions" means something specific in coding, strictly speaking java doesn't have functions.

Style: avoid passive

Suggestion: Code in durable operation must assume that it could re-run on replay, unless it is in a Step with an AT MOST ONCE execution guarantee. This means that external side-effects caused by such code could execute more than once.

yaythomas · 2026-03-08T10:10:22Z

aws-lambda-durable-functions-power/steering/replay-model-rules.md

+
+### What This Means
+
+- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed


this is not quite why it's "acceptable".

suggestion:
once code inside a durable step completes it saves to a checkpoint, and on subsequent replays the operation returns the saved result. in this way the result of non-deterministic code becomes deterministic on replay because the non-deterministic code does not re-run and the durable execution framework uses the checkpoint result instead.

yaythomas · 2026-03-08T10:12:48Z

aws-lambda-durable-functions-power/steering/replay-model-rules.md

+### What This Means
+
+- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed
+- External side effects started from that body should still be safe under re-attempt whenever possible


re-entrancy is strictly speaking the term described in this sentence.

but the general thrust of some of the copy added is for idempotency: same result, no duplicate effects, running it N times has the same effect as running it once

in general, yes, idempotency is a good design pattern to follow here.

however, part of the point of checkpointing is to make provide the idempotency. so this sentence is recommending with "should" to avoid taking advantage of something durable functions provide as a key feature with AT MOST ONCE, which is deterministic checkpointing when wrapping non-idempotent code.

yaythomas · 2026-03-08T10:14:07Z

aws-lambda-durable-functions-power/steering/replay-model-rules.md

+- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed
+- External side effects started from that body should still be safe under re-attempt whenever possible
+- If the side effect needs an identifier for idempotency, derive it from durable inputs/state or generate it once from durable state and reuse it
+- If a **step** cannot be made idempotent and duplicate execution is unacceptable, use `StepSemantics.AtMostOncePerRetry` (TypeScript) or `StepSemantics.AT_MOST_ONCE_PER_RETRY` (Python) with retries disabled so the behavior is effectively zero-or-once rather than more than once


arguably the step is idempotent once it checkpoints.

the inside of the step isn't.

Also, now that we have Java, we should probably avoid listing per language (TypeSCript vs Python) each time, and instead reference the general concept and refer that to a single source of truth.

yaythomas · 2026-04-06T21:22:01Z

aws-lambda-durable-functions-power/steering/getting-started.md

 1. **Write handler** with durable operations
 2. **Test locally** with `LocalDurableTestRunner`
-3. **Validate replay rules** (no non-deterministic code outside steps)
+3. **Validate replay rules** (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies)


what about determinism outside of durable operations? the first sentence is negative (i.e don't do this) and the second is positive (do this), but this is not clear from the text.

Is there a replay section this can link to instead?

yaythomas · 2026-04-06T22:09:36Z

aws-lambda-durable-functions-power/steering/step-operations.md

-7. **Choose correct semantics** (AT_LEAST_ONCE vs AT_MOST_ONCE)
+7. **Choose correct semantics** (`AtLeastOncePerRetry` vs `AtMostOncePerRetry`)
+8. **Use stable identity for external work** - derive identifiers from durable inputs/state, not `Date.now()`, randomness, or fresh UUIDs created inside the step body
+9. **Use `AtMostOncePerRetry` with zero retries for non-idempotent steps** when duplicate execution is unacceptable and you can accept zero-or-once behavior


also repeating text. presumably for a power/agent's use we can keep DRY?

Use AtMostOncePerRetry with retries disabled for steps with
non-idempotent external side effects the step will execute at most once, and if it fails it will not be retry.

yaythomas · 2026-04-06T22:11:18Z

aws-lambda-durable-functions-power/steering/wait-operations.md


 Wait for external systems to respond (human approval, webhook, async job):

+The submitter function passed to `waitForCallback(...)` and the check function passed to `waitForCondition(...)` are durable operation bodies. They are not guaranteed to be atomic with respect to external side effects, so if they start or address external work, use stable identity and idempotent behavior. See [replay-model-rules.md](replay-model-rules.md).


well, they are specifically a step. "durable operation body" is this new concept introduced in this PR to capture any given durable operation that has a step inside it.

as this stands, it's unclear what it means to be a "durable operation bodies."

similar concerns re the "stable identity & idempotent behaviour" I outlined above

yaythomas · 2026-04-06T22:12:23Z

aws-lambda-durable-functions-power/POWER.md

-2. **Cannot nest durable operations** - use `runInChildContext` to group operations
-3. **Closure mutations are lost on replay** - return values from steps
-4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)
+1. **All non-deterministic code outside durable operations MUST be moved into durable operations** (`context.step`, `waitForCallback`, `waitForCondition`, `parallel`/`map` branches)


passive voice, prefer active

yaythomas · 2026-04-06T22:15:16Z

aws-lambda-durable-functions-power/POWER.md

-3. **Closure mutations are lost on replay** - return values from steps
-4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)
+1. **All non-deterministic code outside durable operations MUST be moved into durable operations** (`context.step`, `waitForCallback`, `waitForCondition`, `parallel`/`map` branches)
+2. **Durable operation bodies are not guaranteed to be atomic** - prefer stable identity and idempotent behavior for external side effects; for non-idempotent steps, consider at-most-once-per-retry semantics with zero retries


this repeats yet again content from before.

it also refers to "durable operation bodies" but then describes semantics that are only available on step.

a step body may run successfully but fail to checkpoint, causing it to re-execute on replay. For steps where duplicate execution is unacceptable, use AtMostOncePerRetry with retries disabled

yaythomas · 2026-04-06T22:17:36Z

aws-lambda-durable-functions-power/POWER.md

-3. **Closure mutations that won't persist**: Variables mutated inside steps are NOT preserved across replays — return values from steps instead
-4. **Side effects outside steps that repeat on replay**: Use `context.logger` for logging (it is replay-aware and deduplicates automatically)
+1. **Non-deterministic code outside durable operations**: `Date.now()`, `Math.random()`, UUID generation, API calls, database queries must all be inside durable operations
+2. **Non-atomic durable operation bodies**: Functions passed to `context.step()`, `waitForCallback()`, `waitForCondition()`, and `parallel()`/`map()` branches may be re-attempted before persistence is fully committed — prefer stable identity and idempotent external effects; for non-idempotent steps, use at-most-once-per-retry semantics with zero retries when duplicate execution is unacceptable


this idea is getting repeated a lot. can we consolidate.

If keeping this, maybe something like:

Durable operation bodies are not atomic: Code passed to context.step(),
waitForCallback(), waitForCondition(), and parallel()/map() branches can
succeed but fail to checkpoint, so the runtime may re-execute them on replay.
Prefer idempotent external side effects with stable identity. For steps where
duplicate execution is unacceptable, use AtMostOncePerRetry with retries
disabled.

embano1 requested review from bfreiberg and yaythomas March 8, 2026 09:20

bfreiberg reviewed Mar 8, 2026

View reviewed changes

aws-lambda-durable-functions-power/steering/getting-started.md Outdated Show resolved Hide resolved

docs: apply review suggestion - use direct ESLint guidance

bf43dc0

yaythomas requested changes Apr 6, 2026

View reviewed changes


		## Rule 2: Durable Operation Bodies Are Not Guaranteed To Be Atomic

		Functions passed to durable context APIs must assume the operation is not guaranteed to be atomic with respect to external side effects, and may be re-attempted before the durable runtime has fully recorded the result.


		### What This Means

		- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed


		Wait for external systems to respond (human approval, webhook, async job):

		The submitter function passed to `waitForCallback(...)` and the check function passed to `waitForCondition(...)` are durable operation bodies. They are not guaranteed to be atomic with respect to external side effects, so if they start or address external work, use stable identity and idempotent behavior. See [replay-model-rules.md](replay-model-rules.md).

Conversation

embano1 commented Mar 8, 2026

Uh oh!

embano1 commented Mar 8, 2026

Uh oh!

embano1 commented Mar 8, 2026

Uh oh!

Uh oh!

embano1 commented Mar 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants