ci(framework): Add e2e agentapp test#7467
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11cee779b6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a new end-to-end (E2E) “AgentApp” example app under framework/e2e/ and wires it into CI via a dedicated workflow plus a new test_superlink.sh branch to exercise SuperLink/SuperExec + flwr run execution.
Changes:
- Add a new E2E app
framework/e2e/e2e-agentapp(AgentApp FAB example + docs/config). - Extend
framework/e2e/test_superlink.shwith ane2e-agentappexecution path. - Add a GitHub Actions workflow to run the AgentApp E2E job.
Critical issues
framework/e2e/test_superlink.sh(AgentApp branch): PID handling/cleanup is unreliable ($!capturestimeout,killunderset -ecan abort cleanup, no trap), which can leak background processes and/or fail cleanup (comments 001–004).framework/e2e/e2e-agentapp/README.md:flwr runexamples omit the required positional SuperLink connection name, so the commands likely won’t target the SuperLink started in the guide (comments 008–009).
Simplicity/readability suggestions
- N/A beyond the concrete fixes suggested in comments.
Consistency concerns
framework/e2e/e2e-agentapp/pyproject.tomllacks[tool.flwr.federations]/e2edefinitions, unlike other E2E apps (e.g.framework/e2e/e2e-serverapp-heartbeat/pyproject.toml:27-33), which forces runtime patching and makes local usage harder (comment 006).
Whether the PR should be split
No.
Brief overall verdict
Changes are directionally fine, but the E2E harness needs fixes to be reliable and the README examples need to be corrected before this is safe to rely on.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| framework/e2e/test_superlink.sh | Adds an e2e-agentapp run path; needs more robust process management/cleanup. |
| framework/e2e/e2e-agentapp/README.md | Documents local execution; currently missing required flwr run superlink argument and working-dir clarity. |
| framework/e2e/e2e-agentapp/pyproject.toml | Defines the new E2E AgentApp; should include tool.flwr.federations like other E2E apps. |
| framework/e2e/e2e-agentapp/e2e_agentapp/agent.py | Implements the AgentApp main entrypoint using AgentSession.responses.create. |
| framework/e2e/e2e-agentapp/e2e_agentapp/init.py | Package marker/docstring. |
| .github/workflows/framework-agentapp-e2e.yml | New workflow to run the AgentApp E2E job. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| timeout 5m flower-superlink \ | ||
| $server_arg $db_arg $runtime_dependency_install_arg \ | ||
| --control-api-address 127.0.0.1:9093 \ | ||
| --serverappio-api-address "$server_app_address" & | ||
| sl_pid=$! | ||
| sleep 3 |
| timeout 5m flower-superexec \ | ||
| $server_arg \ | ||
| --appio-api-address "$server_app_address" \ | ||
| --plugin-type serverapp & | ||
| sx_pid=$! | ||
| sleep 3 |
| cleanup_and_exit() { | ||
| kill $sx_pid; | ||
| sleep 1; kill $sl_pid; | ||
| exit $1 | ||
| } | ||
|
|
| if [ "$status" == "finished:completed" ]; then | ||
| found_success=true | ||
| echo "AgentApp worked correctly!" | ||
| cleanup_and_exit 0 | ||
| else | ||
| echo "⏳ Not completed yet, retrying in 2s..." | ||
| sleep 2 | ||
| elapsed=$((elapsed + 2)) | ||
| fi |
| # Install Flower app | ||
| pip install -e . --no-deps | ||
|
|
||
| # Remove any duplicates | ||
| sed -i '/^\[tool\.flwr\.federations\.e2e\]/,/^$/d' pyproject.toml | ||
|
|
||
| echo -e $"\n[tool.flwr.federations.e2e]\naddress = \"127.0.0.1:9093\"\ninsecure = true" >> pyproject.toml |
| [tool.flwr.app] | ||
| publisher = "flwrlabs" | ||
| fab-include = ["e2e_agentapp/**/*.py"] | ||
|
|
||
| [tool.flwr.app.components] | ||
| agentapp = "e2e_agentapp.agent:app" | ||
|
|
||
| [tool.flwr.app.config.agent] | ||
| input = "" | ||
| instructions = "Use web search before answering, then answer in one short sentence." | ||
| model = "openai/gpt-5.5" | ||
| max-output-tokens = 96 | ||
| web-search = true |
| ## Run | ||
|
|
||
| Start a SuperLink: | ||
|
|
| ```bash | ||
| uv run --no-sync --python=3.11.14 flwr run e2e/e2e-agentapp \ | ||
| --run-config 'agent.input="What is the Flower federated learning framework? Answer in one sentence."' | ||
| ``` |
| ```bash | ||
| uv run --no-sync --python=3.11.14 flwr run e2e/e2e-agentapp \ | ||
| --run-config 'agent.input="Say hello in one short sentence." agent.web-search=false' | ||
| ``` |
There was a problem hiding this comment.
I’m a bit concerned about the amount of duplication introduced here. Do we need to duplicate the main body of the test code in this file, or could we refactor it so the shared logic is reused?
Another option might be to move the shell script for the AgentApp E2E test into a dedicated test file. Right now, test_superlink.sh feels like it is serving two quite different purposes: it adds a large AgentApp E2E-specific block and then exits early. To me, that makes the structure a little surprising and harder to follow.
There was a problem hiding this comment.
The README files for E2E tests are usually quite short, since we don’t typically expect people to run them manually. Is there a specific reason this README needs to be this detailed, for example, because the AgentApp setup is different from the other E2E tests?
If not, could we keep this README closer to the usual E2E test README style?
Issue
Description
Related issues/PRs
Proposal
Explanation
Checklist
#contributions)Any other comments?