Add structured output support to ComputerAgent#1276
Add structured output support to ComputerAgent#1276r33drichards wants to merge 1 commit intomainfrom
Conversation
Add `output_type` parameter (Pydantic BaseModel) to ComputerAgent constructor and run() method. When set, the LLM is instructed to return JSON conforming to the schema, and the final text response is automatically parsed and validated into a model instance, yielded as `output_parsed` in the response. - Anthropic: response_format flows through litellm.acompletion automatically - OpenAI: translate response_format to text.format for Responses API - Gemini: pop response_format to prevent errors (unsupported for now) https://claude.ai/code/session_018BuifRZx8Ht5YNtr5NAh8r
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📦 Publishable packages changed
Add |
📝 WalkthroughWalkthroughAdds Pydantic structured-output support to ComputerAgent by introducing an optional Changes
Sequence DiagramsequenceDiagram
actor User
participant Agent as ComputerAgent
participant Format as Response Format<br/>Handler
participant Provider as Provider API<br/>(Gemini/OpenAI)
participant Parser as Pydantic<br/>Model Parser
User->>Agent: run(output_type=MyModel)
activate Agent
Agent->>Format: _pydantic_model_to_response_format(MyModel)
activate Format
Format->>Format: Extract JSON schema<br/>from model
Format-->>Agent: response_format dict
deactivate Format
Agent->>Agent: Inject response_format<br/>into merged_kwargs
Agent->>Provider: predict_step(..., response_format)
activate Provider
Provider->>Provider: Remove/convert response_format<br/>per provider specs
Provider->>Provider: Call generate_content/<br/>aresponses with formatted request
Provider-->>Agent: response with structured output
deactivate Provider
Agent->>Agent: Scan new_items for<br/>final assistant message
Agent->>Format: _extract_text_from_output_item(message)
activate Format
Format->>Format: Extract JSON text content
Format-->>Agent: extracted JSON string
deactivate Format
Agent->>Parser: MyModel.model_validate_json(json_str)
activate Parser
Parser->>Parser: Parse & validate against schema
Parser-->>Agent: parsed MyModel instance
deactivate Parser
Agent->>Agent: Yield structured result<br/>{output_parsed, output: [], Usage}
Agent-->>User: AsyncGenerator with structured result
deactivate Agent
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
libs/python/agent/agent/agent.py (1)
12-30:⚠️ Potential issue | 🟡 MinorRe-run import formatting here.
The Python lint job is currently red on
isort/black, and this import block changed in the same diff. Please format this file before merge.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/agent/agent.py` around lines 12 - 30, Re-run the import formatter (isort/black or your project's formatter) on the import block so imports are correctly grouped and ordered (stdlib, third-party, local) and formatted; specifically ensure the current imports (typing symbols, litellm and litellm.utils, core.telemetry's is_telemetry_enabled/record_event, litellm.responses.utils Usage, and pydantic BaseModel) are reordered and formatted to satisfy isort/black lint rules.libs/python/agent/agent/loops/openai.py (1)
227-248:⚠️ Potential issue | 🟡 MinorMerge
response_formatinto existingtextconfig instead of overwriting it.When
response_formatis set, line 248 replaces any caller-suppliedtextparameter that was included in**kwargs. This silently drops other Responses API text options the caller may have configured. Extract the existingtextdict fromkwargsand merge the newformatkey into it.Proposed fix
- response_format = kwargs.pop("response_format", None) - text_format = None - if response_format is not None: - text_format = {"format": response_format} + response_format = kwargs.pop("response_format", None) + text_config = dict(kwargs.pop("text", {}) or {}) + if response_format is not None: + text_config["format"] = response_format @@ - if text_format is not None: - api_kwargs["text"] = text_format + if text_config: + api_kwargs["text"] = text_config🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/agent/loops/openai.py` around lines 227 - 248, The current logic overwrites any caller-provided "text" options when response_format is set; update the handling in openai.py so that instead of replacing api_kwargs["text"], you extract any existing text dict from kwargs (or from api_kwargs after merging **kwargs), merge/assign the new "format": response_format into that dict (preserving other keys), and then set api_kwargs["text"] to the merged dict; adjust the code paths around response_format, text_format, and api_kwargs to ensure callers' text options are not lost.
🧹 Nitpick comments (1)
libs/python/agent/agent/agent.py (1)
970-975:run(output_type=None)can't clear a constructor-level schema.
effective_output_type = output_type if output_type is not None else self.output_typetreatsNoneas “inherit”, so the new run-level override is one-way only. If callers should be able to opt out for a single run, this needs a sentinel rather thanNone.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/agent/agent.py` around lines 970 - 975, The run-level parameter `output_type` currently uses None to mean "inherit" which prevents callers from clearing a constructor-level schema; introduce a unique sentinel (e.g., OUTPUT_TYPE_UNSET = object()) and change the run signature to use that sentinel as the default, then compute effective_output_type with an identity check (if output_type is not OUTPUT_TYPE_UNSET then effective_output_type = output_type else effective_output_type = self.output_type) so callers can pass None to explicitly clear the constructor-level schema; ensure the subsequent logic that sets merged_kwargs["response_format"] via _pydantic_model_to_response_format(effective_output_type) still only runs when effective_output_type is not None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/python/agent/agent/agent.py`:
- Around line 1102-1118: The final-assistant structured parsing calls
effective_output_type.model_validate_json(...) without handling failures; wrap
that call in a try/except so a malformed/non-JSON assistant text does not raise
out of the generator—catch the validation/parsing exception (e.g.,
pydantic.ValidationError or JSON/ValueError) around the model_validate_json call
in the block that iterates reversed(new_items) and, on exception, skip yielding
"output_parsed" for that item (allow the function to continue/return normally);
you can optionally log the error but must not re-raise it so previous yields
remain valid.
---
Outside diff comments:
In `@libs/python/agent/agent/agent.py`:
- Around line 12-30: Re-run the import formatter (isort/black or your project's
formatter) on the import block so imports are correctly grouped and ordered
(stdlib, third-party, local) and formatted; specifically ensure the current
imports (typing symbols, litellm and litellm.utils, core.telemetry's
is_telemetry_enabled/record_event, litellm.responses.utils Usage, and pydantic
BaseModel) are reordered and formatted to satisfy isort/black lint rules.
In `@libs/python/agent/agent/loops/openai.py`:
- Around line 227-248: The current logic overwrites any caller-provided "text"
options when response_format is set; update the handling in openai.py so that
instead of replacing api_kwargs["text"], you extract any existing text dict from
kwargs (or from api_kwargs after merging **kwargs), merge/assign the new
"format": response_format into that dict (preserving other keys), and then set
api_kwargs["text"] to the merged dict; adjust the code paths around
response_format, text_format, and api_kwargs to ensure callers' text options are
not lost.
---
Nitpick comments:
In `@libs/python/agent/agent/agent.py`:
- Around line 970-975: The run-level parameter `output_type` currently uses None
to mean "inherit" which prevents callers from clearing a constructor-level
schema; introduce a unique sentinel (e.g., OUTPUT_TYPE_UNSET = object()) and
change the run signature to use that sentinel as the default, then compute
effective_output_type with an identity check (if output_type is not
OUTPUT_TYPE_UNSET then effective_output_type = output_type else
effective_output_type = self.output_type) so callers can pass None to explicitly
clear the constructor-level schema; ensure the subsequent logic that sets
merged_kwargs["response_format"] via
_pydantic_model_to_response_format(effective_output_type) still only runs when
effective_output_type is not None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 427e6dc6-50f4-484c-af7d-75a5c836f5a3
📒 Files selected for processing (4)
libs/python/agent/agent/agent.pylibs/python/agent/agent/loops/gemini.pylibs/python/agent/agent/loops/openai.pylibs/python/agent/tests/test_computer_agent.py
| # Parse structured output from the final assistant message | ||
| if effective_output_type is not None and new_items: | ||
| for item in reversed(new_items): | ||
| if item.get("role") == "assistant": | ||
| text_content = _extract_text_from_output_item(item) | ||
| if text_content: | ||
| parsed = effective_output_type.model_validate_json(text_content) | ||
| yield { | ||
| "output_parsed": parsed, | ||
| "output": [], | ||
| "usage": Usage( | ||
| prompt_tokens=0, | ||
| completion_tokens=0, | ||
| total_tokens=0, | ||
| ), | ||
| } | ||
| break |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the main file
find . -type f -name "agent.py" | grep -E "agent/agent\.py$"Repository: trycua/cua
Length of output: 88
🏁 Script executed:
# Check if the file exists and examine the specific lines
if [ -f "libs/python/agent/agent/agent.py" ]; then
echo "=== Lines 1102-1118 in agent.py ==="
sed -n '1102,1118p' "libs/python/agent/agent/agent.py"
echo -e "\n=== Context: imports at top of file ==="
head -50 "libs/python/agent/agent/agent.py" | grep -E "^from|^import"
fiRepository: trycua/cua
Length of output: 1433
🏁 Script executed:
# Also check the gemini.py file around lines 712-713
if [ -f "libs/python/agent/agent/loops/gemini.py" ]; then
echo "=== Lines 710-715 in gemini.py ==="
sed -n '710,715p' "libs/python/agent/agent/loops/gemini.py"
fiRepository: trycua/cua
Length of output: 329
🏁 Script executed:
# Search for model_validate_json usage in the agent.py file
rg "model_validate_json" "libs/python/agent/agent/agent.py" -B 3 -A 3Repository: trycua/cua
Length of output: 430
🏁 Script executed:
# Check what ValidationError is being imported (if at all)
rg "ValidationError" "libs/python/agent/agent/agent.py"Repository: trycua/cua
Length of output: 36
🏁 Script executed:
# Check the full context around the problematic code
if [ -f "libs/python/agent/agent/agent.py" ]; then
echo "=== Broader context (lines 1090-1130) ==="
sed -n '1090,1130p' "libs/python/agent/agent/agent.py"
fiRepository: trycua/cua
Length of output: 1717
🏁 Script executed:
# Check what exceptions are imported/used in the file
rg "except|from pydantic|import.*Error" "libs/python/agent/agent/agent.py" | head -20Repository: trycua/cua
Length of output: 724
🏁 Script executed:
# Check if there's a pattern of error handling elsewhere in the file for model_validate_json
rg "model_validate_json" "libs/python/agent/agent/" -B 2 -A 5Repository: trycua/cua
Length of output: 735
🏁 Script executed:
# Verify the method signature of _extract_text_from_output_item to understand what it returns
rg "_extract_text_from_output_item" "libs/python/agent/agent/agent.py" -B 2 -A 10Repository: trycua/cua
Length of output: 1172
🏁 Script executed:
# Let's verify what exception pydantic's model_validate_json raises
python3 << 'EOF'
from pydantic import BaseModel, ValidationError
class TestModel(BaseModel):
name: str
age: int
# Try invalid JSON
try:
TestModel.model_validate_json("not valid json")
except Exception as e:
print(f"Exception type: {type(e).__name__}")
print(f"Exception module: {type(e).__module__}")
print(f"Is ValidationError: {isinstance(e, ValidationError)}")
EOFRepository: trycua/cua
Length of output: 157
🏁 Script executed:
# Check if there are any tests that verify this behavior
rg "model_validate_json" "libs/python/agent/" --type py -lRepository: trycua/cua
Length of output: 86
🏁 Script executed:
# Look for any test files that test the structured output parsing
find "libs/python/agent" -type f -name "*test*.py" | xargs grep -l "output_parsed\|effective_output_type" 2>/dev/nullRepository: trycua/cua
Length of output: 38
🏁 Script executed:
# Check the function signature and broader error handling context
rg "async def.*predict" "libs/python/agent/agent/agent.py" -A 100 | head -150Repository: trycua/cua
Length of output: 6508
🏁 Script executed:
# Look specifically for the function containing lines 1102-1118
sed -n '1050,1130p' "libs/python/agent/agent/agent.py"Repository: trycua/cua
Length of output: 3191
🏁 Script executed:
# Check if there's a try/except wrapping the structured output parsing
sed -n '1000,1130p' "libs/python/agent/agent/agent.py" | grep -E "try:|except|def "Repository: trycua/cua
Length of output: 132
🏁 Script executed:
# Verify what specific exception is raised and check imports more carefully
python3 << 'EOF'
from pydantic import ValidationError, BaseModel
class TestModel(BaseModel):
name: str
age: int
# Try invalid JSON to see exception details
try:
TestModel.model_validate_json("not valid json")
except ValidationError as e:
print(f"ValidationError imported from: pydantic")
print(f"Exception path: {e.__class__.__module__}.{e.__class__.__name__}")
EOFRepository: trycua/cua
Length of output: 154
🏁 Script executed:
# Check if ValidationError is imported in agent.py
grep -n "ValidationError" "libs/python/agent/agent/agent.py"Repository: trycua/cua
Length of output: 36
🏁 Script executed:
# Check what happens when non-JSON text is sent to model_validate_json
python3 << 'EOF'
from pydantic import BaseModel
class TestModel(BaseModel):
name: str
age: int
# Try with plain text
try:
result = TestModel.model_validate_json("this is not json")
except Exception as e:
print(f"Exception raised: {type(e).__name__}")
print(f"Can it be caught with 'except ValidationError'? {e.__class__.__name__}")
EOFRepository: trycua/cua
Length of output: 151
🏁 Script executed:
# Look at the entire function to understand the control flow better
sed -n '980,1120p' "libs/python/agent/agent/agent.py"Repository: trycua/cua
Length of output: 6016
Guard structured parsing when the final assistant text isn't valid JSON.
This block unconditionally calls model_validate_json(...) without error handling. When a provider like Gemini ignores or drops the schema request (as confirmed in libs/python/agent/agent/loops/gemini.py at lines 712-713), the run fails at the very end with an unhandled ValidationError, even though prior results have already been yielded to the caller. Catch the validation failure and skip parsing when the last assistant message is not valid JSON.
Proposed fix
-from pydantic import BaseModel
+from pydantic import BaseModel, ValidationError
@@
if effective_output_type is not None and new_items:
for item in reversed(new_items):
if item.get("role") == "assistant":
text_content = _extract_text_from_output_item(item)
- if text_content:
- parsed = effective_output_type.model_validate_json(text_content)
- yield {
- "output_parsed": parsed,
- "output": [],
- "usage": Usage(
- prompt_tokens=0,
- completion_tokens=0,
- total_tokens=0,
- ),
- }
+ if not text_content:
+ break
+ try:
+ parsed = effective_output_type.model_validate_json(text_content)
+ except ValidationError:
+ break
+ yield {
+ "output_parsed": parsed,
+ "output": [],
+ "usage": Usage(
+ prompt_tokens=0,
+ completion_tokens=0,
+ total_tokens=0,
+ ),
+ }
break📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Parse structured output from the final assistant message | |
| if effective_output_type is not None and new_items: | |
| for item in reversed(new_items): | |
| if item.get("role") == "assistant": | |
| text_content = _extract_text_from_output_item(item) | |
| if text_content: | |
| parsed = effective_output_type.model_validate_json(text_content) | |
| yield { | |
| "output_parsed": parsed, | |
| "output": [], | |
| "usage": Usage( | |
| prompt_tokens=0, | |
| completion_tokens=0, | |
| total_tokens=0, | |
| ), | |
| } | |
| break | |
| # Parse structured output from the final assistant message | |
| if effective_output_type is not None and new_items: | |
| for item in reversed(new_items): | |
| if item.get("role") == "assistant": | |
| text_content = _extract_text_from_output_item(item) | |
| if not text_content: | |
| break | |
| try: | |
| parsed = effective_output_type.model_validate_json(text_content) | |
| except ValidationError: | |
| break | |
| yield { | |
| "output_parsed": parsed, | |
| "output": [], | |
| "usage": Usage( | |
| prompt_tokens=0, | |
| completion_tokens=0, | |
| total_tokens=0, | |
| ), | |
| } | |
| break |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/python/agent/agent/agent.py` around lines 1102 - 1118, The
final-assistant structured parsing calls
effective_output_type.model_validate_json(...) without handling failures; wrap
that call in a try/except so a malformed/non-JSON assistant text does not raise
out of the generator—catch the validation/parsing exception (e.g.,
pydantic.ValidationError or JSON/ValueError) around the model_validate_json call
in the block that iterates reversed(new_items) and, on exception, skip yielding
"output_parsed" for that item (allow the function to continue/return normally);
you can optionally log the error but must not re-raise it so previous yields
remain valid.
Summary
This PR adds support for structured outputs to the ComputerAgent, allowing users to specify a Pydantic BaseModel class to constrain LLM responses to a specific JSON schema. The agent will validate and parse the LLM's response into an instance of the provided model.
Key Changes
output_typeparameter to accept a Pydantic BaseModel class for structured outputsoutput_typeparameter that overrides the constructor-level setting, with automatic conversion to LLM APIresponse_format_pydantic_model_to_response_format(): Converts a Pydantic model to the standardjson_schemaresponse format_extract_text_from_output_item(): Extracts text content from response items to enable parsingoutput_parsedin the responseresponse_formatto the Responses APItext.formatparameterresponse_formatsince structured outputs are not yet supportedImplementation Details
output_typeparameter is optional and defaults toNonefor backward compatibilityoutput_typetakes precedence over constructor-level settingoutput_typeis specified and valid JSON is found in the final assistant messagehttps://claude.ai/code/session_018BuifRZx8Ht5YNtr5NAh8r
Summary by CodeRabbit
Release Notes