Skip to content

Add structured output support to ComputerAgent#1276

Open
r33drichards wants to merge 1 commit intomainfrom
claude/add-structured-outputs-Yckd3
Open

Add structured output support to ComputerAgent#1276
r33drichards wants to merge 1 commit intomainfrom
claude/add-structured-outputs-Yckd3

Conversation

@r33drichards
Copy link
Copy Markdown
Collaborator

@r33drichards r33drichards commented Apr 7, 2026

Summary

This PR adds support for structured outputs to the ComputerAgent, allowing users to specify a Pydantic BaseModel class to constrain LLM responses to a specific JSON schema. The agent will validate and parse the LLM's response into an instance of the provided model.

Key Changes

  • ComputerAgent initialization: Added optional output_type parameter to accept a Pydantic BaseModel class for structured outputs
  • ComputerAgent.run(): Added optional output_type parameter that overrides the constructor-level setting, with automatic conversion to LLM API response_format
  • Helper functions:
    • _pydantic_model_to_response_format(): Converts a Pydantic model to the standard json_schema response format
    • _extract_text_from_output_item(): Extracts text content from response items to enable parsing
  • Response parsing: Final assistant message is parsed into the specified output model and yielded as output_parsed in the response
  • Loop-specific handling:
    • OpenAI loop: Translates response_format to the Responses API text.format parameter
    • Gemini loop: Pops response_format since structured outputs are not yet supported
  • Comprehensive tests: Added test suite covering initialization, conversion, text extraction, and edge cases

Implementation Details

  • The output_type parameter is optional and defaults to None for backward compatibility
  • Run-level output_type takes precedence over constructor-level setting
  • Structured output parsing only occurs when an output_type is specified and valid JSON is found in the final assistant message
  • The implementation gracefully handles both dict-based and object-based content blocks in responses

https://claude.ai/code/session_018BuifRZx8Ht5YNtr5NAh8r

Summary by CodeRabbit

Release Notes

  • New Features
    • Structured output support: Agent now accepts optional output type specifications to validate and automatically parse API responses against defined Pydantic schemas.
    • Flexible configuration: Output types can be specified during agent initialization or overridden at runtime for each request.
    • Improved reliability: Parsed responses are automatically validated, ensuring data integrity and type safety.

Add `output_type` parameter (Pydantic BaseModel) to ComputerAgent constructor
and run() method. When set, the LLM is instructed to return JSON conforming to
the schema, and the final text response is automatically parsed and validated
into a model instance, yielded as `output_parsed` in the response.

- Anthropic: response_format flows through litellm.acompletion automatically
- OpenAI: translate response_format to text.format for Responses API
- Gemini: pop response_format to prevent errors (unsupported for now)

https://claude.ai/code/session_018BuifRZx8Ht5YNtr5NAh8r
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 7, 2026 7:51pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

📦 Publishable packages changed

  • pypi/agent

Add release:<service> labels to auto-release on merge (+ optional bump:minor or bump:major, default is patch).
Or add no-release to skip.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Adds Pydantic structured-output support to ComputerAgent by introducing an optional output_type parameter. The agent injects response format metadata into provider-specific requests, parses assistant message output against the specified model, and yields structured results. Provider implementations handle response format cleanup appropriately.

Changes

Cohort / File(s) Summary
Core Agent Structured Output
libs/python/agent/agent/agent.py
Added output_type parameter to __init__ and run methods. Implements response format injection via _pydantic_model_to_response_format(), JSON extraction via _extract_text_from_output_item(), and structured result parsing with model_validate_json(). Yields additional structured result on successful parse.
Provider Loop Implementations
libs/python/agent/agent/loops/gemini.py, libs/python/agent/agent/loops/openai.py
Gemini: removes response_format from kwargs before building request. OpenAI: converts response_format to Responses API text configuration and injects into api_kwargs.
Test Coverage
libs/python/agent/tests/test_computer_agent.py
Added TestComputerAgentStructuredOutputs class with tests for output_type initialization, helper function _pydantic_model_to_response_format(), and helper function _extract_text_from_output_item().

Sequence Diagram

sequenceDiagram
    actor User
    participant Agent as ComputerAgent
    participant Format as Response Format<br/>Handler
    participant Provider as Provider API<br/>(Gemini/OpenAI)
    participant Parser as Pydantic<br/>Model Parser
    
    User->>Agent: run(output_type=MyModel)
    activate Agent
    Agent->>Format: _pydantic_model_to_response_format(MyModel)
    activate Format
    Format->>Format: Extract JSON schema<br/>from model
    Format-->>Agent: response_format dict
    deactivate Format
    Agent->>Agent: Inject response_format<br/>into merged_kwargs
    Agent->>Provider: predict_step(..., response_format)
    activate Provider
    Provider->>Provider: Remove/convert response_format<br/>per provider specs
    Provider->>Provider: Call generate_content/<br/>aresponses with formatted request
    Provider-->>Agent: response with structured output
    deactivate Provider
    Agent->>Agent: Scan new_items for<br/>final assistant message
    Agent->>Format: _extract_text_from_output_item(message)
    activate Format
    Format->>Format: Extract JSON text content
    Format-->>Agent: extracted JSON string
    deactivate Format
    Agent->>Parser: MyModel.model_validate_json(json_str)
    activate Parser
    Parser->>Parser: Parse & validate against schema
    Parser-->>Agent: parsed MyModel instance
    deactivate Parser
    Agent->>Agent: Yield structured result<br/>{output_parsed, output: [], Usage}
    Agent-->>User: AsyncGenerator with structured result
    deactivate Agent
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Structured outputs hop into the fray,
With Pydantic schemas that light the way,
JSON parsed cleanly from Gemini's call,
OpenAI's responses, we format them all!
Hop hop, the agent now speaks with grace,
Typed models bounding through cyberspace! 🎯

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change—adding structured output support to ComputerAgent—which is the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-structured-outputs-Yckd3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sentry
Copy link
Copy Markdown

sentry bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 70.58824% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
libs/python/agent/agent/agent.py 43.47% 13 Missing ⚠️
libs/python/agent/agent/loops/openai.py 0.00% 6 Missing ⚠️
libs/python/agent/agent/loops/gemini.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
libs/python/agent/agent/agent.py (1)

12-30: ⚠️ Potential issue | 🟡 Minor

Re-run import formatting here.

The Python lint job is currently red on isort/black, and this import block changed in the same diff. Please format this file before merge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/agent/agent.py` around lines 12 - 30, Re-run the import
formatter (isort/black or your project's formatter) on the import block so
imports are correctly grouped and ordered (stdlib, third-party, local) and
formatted; specifically ensure the current imports (typing symbols, litellm and
litellm.utils, core.telemetry's is_telemetry_enabled/record_event,
litellm.responses.utils Usage, and pydantic BaseModel) are reordered and
formatted to satisfy isort/black lint rules.
libs/python/agent/agent/loops/openai.py (1)

227-248: ⚠️ Potential issue | 🟡 Minor

Merge response_format into existing text config instead of overwriting it.

When response_format is set, line 248 replaces any caller-supplied text parameter that was included in **kwargs. This silently drops other Responses API text options the caller may have configured. Extract the existing text dict from kwargs and merge the new format key into it.

Proposed fix
-        response_format = kwargs.pop("response_format", None)
-        text_format = None
-        if response_format is not None:
-            text_format = {"format": response_format}
+        response_format = kwargs.pop("response_format", None)
+        text_config = dict(kwargs.pop("text", {}) or {})
+        if response_format is not None:
+            text_config["format"] = response_format
@@
-        if text_format is not None:
-            api_kwargs["text"] = text_format
+        if text_config:
+            api_kwargs["text"] = text_config
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/agent/loops/openai.py` around lines 227 - 248, The current
logic overwrites any caller-provided "text" options when response_format is set;
update the handling in openai.py so that instead of replacing
api_kwargs["text"], you extract any existing text dict from kwargs (or from
api_kwargs after merging **kwargs), merge/assign the new "format":
response_format into that dict (preserving other keys), and then set
api_kwargs["text"] to the merged dict; adjust the code paths around
response_format, text_format, and api_kwargs to ensure callers' text options are
not lost.
🧹 Nitpick comments (1)
libs/python/agent/agent/agent.py (1)

970-975: run(output_type=None) can't clear a constructor-level schema.

effective_output_type = output_type if output_type is not None else self.output_type treats None as “inherit”, so the new run-level override is one-way only. If callers should be able to opt out for a single run, this needs a sentinel rather than None.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/agent/agent.py` around lines 970 - 975, The run-level
parameter `output_type` currently uses None to mean "inherit" which prevents
callers from clearing a constructor-level schema; introduce a unique sentinel
(e.g., OUTPUT_TYPE_UNSET = object()) and change the run signature to use that
sentinel as the default, then compute effective_output_type with an identity
check (if output_type is not OUTPUT_TYPE_UNSET then effective_output_type =
output_type else effective_output_type = self.output_type) so callers can pass
None to explicitly clear the constructor-level schema; ensure the subsequent
logic that sets merged_kwargs["response_format"] via
_pydantic_model_to_response_format(effective_output_type) still only runs when
effective_output_type is not None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/python/agent/agent/agent.py`:
- Around line 1102-1118: The final-assistant structured parsing calls
effective_output_type.model_validate_json(...) without handling failures; wrap
that call in a try/except so a malformed/non-JSON assistant text does not raise
out of the generator—catch the validation/parsing exception (e.g.,
pydantic.ValidationError or JSON/ValueError) around the model_validate_json call
in the block that iterates reversed(new_items) and, on exception, skip yielding
"output_parsed" for that item (allow the function to continue/return normally);
you can optionally log the error but must not re-raise it so previous yields
remain valid.

---

Outside diff comments:
In `@libs/python/agent/agent/agent.py`:
- Around line 12-30: Re-run the import formatter (isort/black or your project's
formatter) on the import block so imports are correctly grouped and ordered
(stdlib, third-party, local) and formatted; specifically ensure the current
imports (typing symbols, litellm and litellm.utils, core.telemetry's
is_telemetry_enabled/record_event, litellm.responses.utils Usage, and pydantic
BaseModel) are reordered and formatted to satisfy isort/black lint rules.

In `@libs/python/agent/agent/loops/openai.py`:
- Around line 227-248: The current logic overwrites any caller-provided "text"
options when response_format is set; update the handling in openai.py so that
instead of replacing api_kwargs["text"], you extract any existing text dict from
kwargs (or from api_kwargs after merging **kwargs), merge/assign the new
"format": response_format into that dict (preserving other keys), and then set
api_kwargs["text"] to the merged dict; adjust the code paths around
response_format, text_format, and api_kwargs to ensure callers' text options are
not lost.

---

Nitpick comments:
In `@libs/python/agent/agent/agent.py`:
- Around line 970-975: The run-level parameter `output_type` currently uses None
to mean "inherit" which prevents callers from clearing a constructor-level
schema; introduce a unique sentinel (e.g., OUTPUT_TYPE_UNSET = object()) and
change the run signature to use that sentinel as the default, then compute
effective_output_type with an identity check (if output_type is not
OUTPUT_TYPE_UNSET then effective_output_type = output_type else
effective_output_type = self.output_type) so callers can pass None to explicitly
clear the constructor-level schema; ensure the subsequent logic that sets
merged_kwargs["response_format"] via
_pydantic_model_to_response_format(effective_output_type) still only runs when
effective_output_type is not None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 427e6dc6-50f4-484c-af7d-75a5c836f5a3

📥 Commits

Reviewing files that changed from the base of the PR and between 1e62756 and ebcde47.

📒 Files selected for processing (4)
  • libs/python/agent/agent/agent.py
  • libs/python/agent/agent/loops/gemini.py
  • libs/python/agent/agent/loops/openai.py
  • libs/python/agent/tests/test_computer_agent.py

Comment on lines +1102 to +1118
# Parse structured output from the final assistant message
if effective_output_type is not None and new_items:
for item in reversed(new_items):
if item.get("role") == "assistant":
text_content = _extract_text_from_output_item(item)
if text_content:
parsed = effective_output_type.model_validate_json(text_content)
yield {
"output_parsed": parsed,
"output": [],
"usage": Usage(
prompt_tokens=0,
completion_tokens=0,
total_tokens=0,
),
}
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the main file
find . -type f -name "agent.py" | grep -E "agent/agent\.py$"

Repository: trycua/cua

Length of output: 88


🏁 Script executed:

# Check if the file exists and examine the specific lines
if [ -f "libs/python/agent/agent/agent.py" ]; then
  echo "=== Lines 1102-1118 in agent.py ==="
  sed -n '1102,1118p' "libs/python/agent/agent/agent.py"
  echo -e "\n=== Context: imports at top of file ==="
  head -50 "libs/python/agent/agent/agent.py" | grep -E "^from|^import"
fi

Repository: trycua/cua

Length of output: 1433


🏁 Script executed:

# Also check the gemini.py file around lines 712-713
if [ -f "libs/python/agent/agent/loops/gemini.py" ]; then
  echo "=== Lines 710-715 in gemini.py ==="
  sed -n '710,715p' "libs/python/agent/agent/loops/gemini.py"
fi

Repository: trycua/cua

Length of output: 329


🏁 Script executed:

# Search for model_validate_json usage in the agent.py file
rg "model_validate_json" "libs/python/agent/agent/agent.py" -B 3 -A 3

Repository: trycua/cua

Length of output: 430


🏁 Script executed:

# Check what ValidationError is being imported (if at all)
rg "ValidationError" "libs/python/agent/agent/agent.py"

Repository: trycua/cua

Length of output: 36


🏁 Script executed:

# Check the full context around the problematic code
if [ -f "libs/python/agent/agent/agent.py" ]; then
  echo "=== Broader context (lines 1090-1130) ==="
  sed -n '1090,1130p' "libs/python/agent/agent/agent.py"
fi

Repository: trycua/cua

Length of output: 1717


🏁 Script executed:

# Check what exceptions are imported/used in the file
rg "except|from pydantic|import.*Error" "libs/python/agent/agent/agent.py" | head -20

Repository: trycua/cua

Length of output: 724


🏁 Script executed:

# Check if there's a pattern of error handling elsewhere in the file for model_validate_json
rg "model_validate_json" "libs/python/agent/agent/" -B 2 -A 5

Repository: trycua/cua

Length of output: 735


🏁 Script executed:

# Verify the method signature of _extract_text_from_output_item to understand what it returns
rg "_extract_text_from_output_item" "libs/python/agent/agent/agent.py" -B 2 -A 10

Repository: trycua/cua

Length of output: 1172


🏁 Script executed:

# Let's verify what exception pydantic's model_validate_json raises
python3 << 'EOF'
from pydantic import BaseModel, ValidationError

class TestModel(BaseModel):
    name: str
    age: int

# Try invalid JSON
try:
    TestModel.model_validate_json("not valid json")
except Exception as e:
    print(f"Exception type: {type(e).__name__}")
    print(f"Exception module: {type(e).__module__}")
    print(f"Is ValidationError: {isinstance(e, ValidationError)}")
EOF

Repository: trycua/cua

Length of output: 157


🏁 Script executed:

# Check if there are any tests that verify this behavior
rg "model_validate_json" "libs/python/agent/" --type py -l

Repository: trycua/cua

Length of output: 86


🏁 Script executed:

# Look for any test files that test the structured output parsing
find "libs/python/agent" -type f -name "*test*.py" | xargs grep -l "output_parsed\|effective_output_type" 2>/dev/null

Repository: trycua/cua

Length of output: 38


🏁 Script executed:

# Check the function signature and broader error handling context
rg "async def.*predict" "libs/python/agent/agent/agent.py" -A 100 | head -150

Repository: trycua/cua

Length of output: 6508


🏁 Script executed:

# Look specifically for the function containing lines 1102-1118
sed -n '1050,1130p' "libs/python/agent/agent/agent.py"

Repository: trycua/cua

Length of output: 3191


🏁 Script executed:

# Check if there's a try/except wrapping the structured output parsing
sed -n '1000,1130p' "libs/python/agent/agent/agent.py" | grep -E "try:|except|def "

Repository: trycua/cua

Length of output: 132


🏁 Script executed:

# Verify what specific exception is raised and check imports more carefully
python3 << 'EOF'
from pydantic import ValidationError, BaseModel

class TestModel(BaseModel):
    name: str
    age: int

# Try invalid JSON to see exception details
try:
    TestModel.model_validate_json("not valid json")
except ValidationError as e:
    print(f"ValidationError imported from: pydantic")
    print(f"Exception path: {e.__class__.__module__}.{e.__class__.__name__}")
EOF

Repository: trycua/cua

Length of output: 154


🏁 Script executed:

# Check if ValidationError is imported in agent.py
grep -n "ValidationError" "libs/python/agent/agent/agent.py"

Repository: trycua/cua

Length of output: 36


🏁 Script executed:

# Check what happens when non-JSON text is sent to model_validate_json
python3 << 'EOF'
from pydantic import BaseModel

class TestModel(BaseModel):
    name: str
    age: int

# Try with plain text
try:
    result = TestModel.model_validate_json("this is not json")
except Exception as e:
    print(f"Exception raised: {type(e).__name__}")
    print(f"Can it be caught with 'except ValidationError'? {e.__class__.__name__}")
EOF

Repository: trycua/cua

Length of output: 151


🏁 Script executed:

# Look at the entire function to understand the control flow better
sed -n '980,1120p' "libs/python/agent/agent/agent.py"

Repository: trycua/cua

Length of output: 6016


Guard structured parsing when the final assistant text isn't valid JSON.

This block unconditionally calls model_validate_json(...) without error handling. When a provider like Gemini ignores or drops the schema request (as confirmed in libs/python/agent/agent/loops/gemini.py at lines 712-713), the run fails at the very end with an unhandled ValidationError, even though prior results have already been yielded to the caller. Catch the validation failure and skip parsing when the last assistant message is not valid JSON.

Proposed fix
-from pydantic import BaseModel
+from pydantic import BaseModel, ValidationError
@@
         if effective_output_type is not None and new_items:
             for item in reversed(new_items):
                 if item.get("role") == "assistant":
                     text_content = _extract_text_from_output_item(item)
-                    if text_content:
-                        parsed = effective_output_type.model_validate_json(text_content)
-                        yield {
-                            "output_parsed": parsed,
-                            "output": [],
-                            "usage": Usage(
-                                prompt_tokens=0,
-                                completion_tokens=0,
-                                total_tokens=0,
-                            ),
-                        }
+                    if not text_content:
+                        break
+                    try:
+                        parsed = effective_output_type.model_validate_json(text_content)
+                    except ValidationError:
+                        break
+                    yield {
+                        "output_parsed": parsed,
+                        "output": [],
+                        "usage": Usage(
+                            prompt_tokens=0,
+                            completion_tokens=0,
+                            total_tokens=0,
+                        ),
+                    }
                     break
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Parse structured output from the final assistant message
if effective_output_type is not None and new_items:
for item in reversed(new_items):
if item.get("role") == "assistant":
text_content = _extract_text_from_output_item(item)
if text_content:
parsed = effective_output_type.model_validate_json(text_content)
yield {
"output_parsed": parsed,
"output": [],
"usage": Usage(
prompt_tokens=0,
completion_tokens=0,
total_tokens=0,
),
}
break
# Parse structured output from the final assistant message
if effective_output_type is not None and new_items:
for item in reversed(new_items):
if item.get("role") == "assistant":
text_content = _extract_text_from_output_item(item)
if not text_content:
break
try:
parsed = effective_output_type.model_validate_json(text_content)
except ValidationError:
break
yield {
"output_parsed": parsed,
"output": [],
"usage": Usage(
prompt_tokens=0,
completion_tokens=0,
total_tokens=0,
),
}
break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/agent/agent.py` around lines 1102 - 1118, The
final-assistant structured parsing calls
effective_output_type.model_validate_json(...) without handling failures; wrap
that call in a try/except so a malformed/non-JSON assistant text does not raise
out of the generator—catch the validation/parsing exception (e.g.,
pydantic.ValidationError or JSON/ValueError) around the model_validate_json call
in the block that iterates reversed(new_items) and, on exception, skip yielding
"output_parsed" for that item (allow the function to continue/return normally);
you can optionally log the error but must not re-raise it so previous yields
remain valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants