docs(ai-agents): foreground the debugger over the REPL

d-biehl · d-biehl · commit 0c40394d97b5 · 2026-06-09T16:27:23.000+02:00
Make robot-debug the primary tool for working with a real test and
the REPL the narrower fallback for exploring when no test exists yet.
Add debugger examples, a debugger-first core-habit entry, a
troubleshooting item for chasing a failing test instead of debugging
it, and note that agent output capture and the plain backend apply to
robot-debug too.
diff --git a/docs/03_reference/ai-agents.md b/docs/03_reference/ai-agents.md
@@ -2,7 +2,7 @@
 
 AI coding agents are good at writing code and bad at guessing how *your* Robot Framework project is actually wired. Which files become suites, which tags a test really ends up with, where a keyword comes from, what a finished run contained — none of that is reliably visible by reading `.robot` files. It is decided at runtime by `robot.toml`, profiles, variables, the installed library versions, and pre-run modifiers.
 
-**RobotCode** closes that gap by teaching the agent to work through the project's own [`robotcode`](cli.md) CLI instead of guessing. The agent discovers tests, runs suites, inspects results, looks up keywords, and explores live in the REPL — all through the same resolved view of the project that the rest of **RobotCode** uses. The result is an agent that behaves less like a generic code model and more like a Robot Framework engineer who knows your setup.
+**RobotCode** closes that gap by teaching the agent to work through the project's own [`robotcode`](cli.md) CLI instead of guessing. The agent discovers tests, runs suites, inspects results, looks up keywords, debugs failing tests at a real breakpoint, and explores live in the REPL — all through the same resolved view of the project that the rest of **RobotCode** uses. The result is an agent that behaves less like a generic code model and more like a Robot Framework engineer who knows your setup.
 
 There are two pieces, and they work independently:
 
@@ -22,10 +22,12 @@ Once the plugin is active, everyday Robot Framework requests are handled through
 - *"run the smoke suite with the `ci` profile"* — runs it through the selected profile and reports pass/fail counts
 - *"rerun just the tests that failed last time"*
 - *"why did `Login Works` fail in last night's run?"* — inspects the existing results, no re-run
+- *"this test keeps failing — step through it and tell me what `${response}` is when it breaks"* — pauses the real run in the [debugger](robot-debug.md) and reads the live stack and variables, instead of re-running blindly or guessing
+- *"break at `login.robot:42` and show me the variables there"*
 - *"what tests and tags exist?"* — resolves the real set with `discover` (paths, profiles, variables, pre-run modifiers), not a file scan
 - *"what arguments does our `Create Order` keyword take?"* — looks it up against the installed libraries and local resources
 - *"is there already a keyword for waiting until the spinner is gone?"*
-- *"try the new login flow against the real app, with a visible browser"* — drives it live in the REPL
+- *"try the new login flow against the real app, with a visible browser"* — drives it live in the REPL (no test written yet)
 - *"set up a `prod` profile with `BASE_URL=https://prod.example.com`"*
 - *"lint only the files I changed today"*
 
@@ -92,7 +94,8 @@ The plugin currently contains a single *skill* — a set of instructions the age
 
 - **Inventory via [`discover`](discovering-tests.md), never by grepping files.** Which tests, tasks, suites, and tags exist is resolved at runtime — Robot's parsing rules, `robot.toml` paths, profiles, variables, and pre-run modifiers that add, remove, rename, or retag tests. A static file scan gets it wrong; `discover` runs the real resolution with the installed Robot Framework.
 - **Keyword and library lookup via [`libdoc`](cli.md#libdoc), before generic knowledge.** `libdoc` reflects the *installed* library versions, the project's import arguments, the Python path, and local `.resource` files — things external documentation can't see.
-- **Live exploration via the [REPL](repl.md).** Uncertain locators, keyword sequencing, or "does this work against the running app?" get verified interactively instead of guessed, optionally with a visible browser to watch.
+- **Debugging a failing test via [`robot-debug`](robot-debug.md) — the agent's primary tool for any real test.** Whenever an existing test fails, won't run, or needs stepping through, the agent runs it under the command-line debugger: it pauses the real suite at a breakpoint (a `file:line`, a keyword, or the first uncaught failure), then reads the live call stack and per-frame variables and runs keywords in the paused context — capturing the actual state at the point of failure instead of re-running blindly or reasoning from the source. This is the default response to "why does this fail / step through it / what is `${x}` here?", and it is deliberately kept apart from the REPL below: the debugger acts on a test that *exists and runs*. Reaching for the REPL to fix a real test — instead of the debugger — is the single most common way the skill misfires.
+- **Live exploration via the [REPL](repl.md) — only when there is *no* test yet.** A narrower tool: uncertain locators or keyword sequencing get tried interactively (optionally against the running app with a visible browser) rather than guessed. The moment a real test is in play, it is the debugger's job, not the REPL's.
 - **Result inspection via [`results`](analyzing-results.md), not raw `output.xml`.** Finished runs are queried for bounded summaries, listings, traces, and diffs — including CI artifacts and a colleague's run — rather than loading a potentially huge XML file into the chat.
 - **Static checks via [`analyze code`](analyzing-code.md)** before a run, and **runs via [`robot`](cli.md#robot)** honoring the active profile.
 
@@ -116,7 +119,7 @@ Keep it short and factual; this is standing context the agent reads on every req
 Independently of the chat plugins, the `robotcode` CLI detects when it is running inside an AI-agent session and adjusts its presentation defaults so captured output stays clean:
 
 - ANSI **colors** and the **pager** are disabled, so escape sequences and paging controls don't leak into the agent's captured stdout.
-- [`robotcode repl`](repl.md) falls back to the **plain input backend**, so completion popups and prompt redraws don't interfere with stdin/stdout.
+- The [`robotcode robot-debug`](robot-debug.md) prompt and the [`robotcode repl`](repl.md) shell fall back to the **plain input backend**, so completion popups and prompt redraws don't interfere with stdin/stdout — captured debug output (`.where`, `.vars`, `.print`) stays clean.
 
 Detection is based on environment markers set by popular tools — Claude Code, Cursor, GitHub Copilot (CLI and VS Code agent flow), Codex, OpenCode, Gemini CLI, and others — plus the generic `AI_AGENT` and `AGENT` conventions. A marker counts as active when it is present with any value other than empty or `0`.
 
@@ -128,7 +131,7 @@ You rarely need to touch this, but every default can be overridden:
 | `ROBOTCODE_NO_AI_AGENT=1` | Force detection **off**. Wins over tool markers, loses to `ROBOTCODE_FORCE_AI_AGENT`. |
 | `--color` / `--no-color`, `NO_COLOR`, `FORCE_COLOR` | Decide coloring explicitly, regardless of detection. |
 | `--pager` / `--no-pager` | Decide paging explicitly. |
-| `--plain` / `--backend`, `ROBOTCODE_REPL_PLAIN`, `ROBOTCODE_REPL_BACKEND` | Decide the REPL backend explicitly. |
+| `--plain` / `--backend`, `ROBOTCODE_REPL_PLAIN`, `ROBOTCODE_REPL_BACKEND` | Decide the prompt backend explicitly — applies to both `robot-debug` and the REPL. |
 
 Explicit flags and environment variables always win over auto-detection, so you can opt back into colored, paged, or full-featured output inside an agent session when you want it.
 
@@ -138,7 +141,9 @@ Explicit flags and environment variables always win over auto-detection, so you
 
 **The agent's answers don't match your project** — libraries or keywords it should see are reported as missing, or argument lists look wrong. It is most likely driving a `robotcode` from the wrong environment rather than the project's (see [Prerequisites](#prerequisites)). Have the agent run `robotcode discover info` to show which interpreter and versions it is using, and compare that against the project's environment.
 
-**The agent writes a `.robot` file when you only wanted it to *do* something** — try a keyword, check a locator, watch a flow against the running app. This is the skill's most common misfire. Tell it not to write a test and to use the REPL instead ("don't write a test, just run it live"); it can save the session as a test afterwards if you ask. See [Interactive Robot Framework REPL](repl.md).
+**The agent re-runs a failing test over and over (or pastes it into the REPL) instead of debugging it.** This is the skill's single most common misfire. When a *real* test fails, the right tool is the [debugger](robot-debug.md), not blind re-runs or the REPL — it pauses the actual run at the failure and exposes the live stack and variables. Tell it to debug the test ("step through it with `robot-debug`", "break where it fails and show me the variables"). The REPL is for exploring when there's no test yet; the debugger is for a test that already runs.
+
+**The agent writes a `.robot` file when you only wanted it to *do* something** — try a keyword or check a locator with no test yet. Tell it not to write a test and to use the REPL instead ("don't write a test, just run it live"); it can save the session as a test afterwards if you ask. See [Interactive Robot Framework REPL](repl.md). (If a test already exists, you want the debugger above, not the REPL.)
 
 **A marketplace install behaves differently from the bundled VS Code plugin.** The two copies are versioned independently: the bundled one ships with the extension, the marketplace one updates through your agent's `plugin marketplace update`. If behavior diverges, update the marketplace copy — and make sure you aren't running both at once (see [Avoiding duplicates](#avoiding-duplicates)).
 
@@ -147,5 +152,6 @@ Explicit flags and environment variables always win over auto-detection, so you
 - [Command Line Interface](cli.md) — every `robotcode` command the agent drives.
 - [Discovering Tests, Tasks and Suites](discovering-tests.md) — how project inventory is resolved.
 - [Analyzing Run Results](analyzing-results.md) — inspecting finished runs.
+- [Command-line debugging with `robotcode robot-debug`](robot-debug.md) — pausing a real run at a breakpoint to inspect the live stack and variables.
 - [Interactive Robot Framework REPL](repl.md) — the live keyword shell.
 - [`robotframework-agent-plugins`](https://github.com/robotcodedev/robotframework-agent-plugins) — the marketplace and plugin source.