Skip to content

Add automated visual testing with Claude vision validation#8

Merged
Coldaine merged 5 commits intomasterfrom
claude/review-project-status-mAV7o
Feb 16, 2026
Merged

Add automated visual testing with Claude vision validation#8
Coldaine merged 5 commits intomasterfrom
claude/review-project-status-mAV7o

Conversation

@Coldaine
Copy link
Copy Markdown
Owner

Summary

This PR introduces an automated visual testing system for the overlay UI that captures screenshots and validates them using Claude's vision capabilities. The system integrates with GitHub Actions to run on every PR that touches overlay code, providing continuous visual regression detection.

Key Changes

  • scripts/visual_test_overlay.py: New test harness that launches the overlay in demo mode and captures screenshots across 5 test scenarios:

    • Empty overlay (no suggestions)
    • Overlay with demo suggestions
    • Single suggestion
    • Maximum 3 suggestions
    • Cleared suggestions state
  • scripts/validate_screenshots.py: New validation script that sends screenshots to Claude API for automated visual validation against defined criteria:

    • Checks for required visual elements (chips, text, layout)
    • Verifies absence of unwanted elements
    • Generates JSON reports with pass/fail results and confidence levels
    • Supports both CI and local usage
  • .github/workflows/visual-tests.yml: New GitHub Actions workflow that:

    • Runs on PRs touching overlay code
    • Sets up virtual display (xvfb) for headless screenshot capture
    • Executes visual tests and Claude validation
    • Uploads screenshots as artifacts for manual review
    • Generates detailed summary in PR checks
    • Supports manual triggering with configurable options
  • docs/plans/visual-test-checklist.md: Documentation covering:

    • Setup instructions for the ANTHROPIC_API_KEY secret
    • Expected test scenarios and validation criteria
    • Manual testing procedures
    • Troubleshooting guide

Implementation Details

  • Screenshots are captured using available tools (spectacle, scrot, gnome-screenshot, ImageMagick) with fallback support
  • Claude validation uses structured JSON responses for reliable parsing
  • Validation criteria are defined declaratively for each test scenario
  • The workflow gracefully handles missing API keys (skips validation but still captures screenshots)
  • Exit codes properly indicate success/failure for CI integration
  • All scripts include comprehensive error handling and user-friendly output

https://claude.ai/code/session_01JuVGRqFXwACVZKofjYoJin

claude added 3 commits January 6, 2026 23:14
- Create scripts/visual_test_overlay.py for automated screenshot capture
- Add docs/plans/visual-test-checklist.md with validation criteria
- Enables screenshot-based testing on KDE Plasma environments
- Screenshots can be reviewed with AI vision for PR-05 validation

Test workflow:
1. Run visual_test_overlay.py on graphical KDE session
2. Captures 5 test scenarios (empty, suggestions, max, cleared)
3. Share screenshots for visual review

Co-Authored-By: Claude <noreply@anthropic.com>
- Runs visual_test_overlay.py under xvfb with full graphics stack
- Uploads screenshots as downloadable artifacts
- Triggers on manual dispatch or overlay code changes
- Creates montage image for easy review
- Includes overlay unit tests as separate job

Usage:
  - Go to Actions tab → Visual Tests → Run workflow
  - Download screenshots artifact after completion
  - Review screenshots for visual validation

Co-Authored-By: Claude <noreply@anthropic.com>
This enables fully automated visual validation of the overlay UI in CI:

- scripts/validate_screenshots.py: Sends screenshots to Claude API for
  vision-based validation against defined criteria
- Updated visual-tests.yml workflow to:
  - Capture screenshots under xvfb
  - Call Claude API to validate each screenshot
  - Generate pass/fail report with detailed reasoning
  - Upload artifacts for manual review
  - Show summary in GitHub Actions UI
- Updated docs with complete workflow instructions

To enable:
1. Add ANTHROPIC_API_KEY secret in repo settings
2. Run "Visual Tests" workflow from Actions tab
3. Review automated validation results

The workflow validates:
- Empty overlay state
- Demo suggestions display (2 chips)
- Single suggestion display
- Maximum 3 chips limit
- Cleared suggestions state

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 12, 2026 14:56
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Add automated visual testing with Claude vision validation

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Automated visual testing system for overlay UI with Claude vision validation
• Screenshot capture harness for 5 test scenarios (empty, suggestions, single, max, cleared)
• Claude API integration for automated visual validation against defined criteria
• GitHub Actions workflow with xvfb for headless screenshot capture and validation
• Comprehensive documentation with setup instructions and troubleshooting guide
Diagram
flowchart LR
  A["Test Scenarios"] -->|"Capture"| B["visual_test_overlay.py"]
  B -->|"Generate"| C["Screenshots"]
  C -->|"Validate"| D["validate_screenshots.py"]
  D -->|"Send to API"| E["Claude Vision"]
  E -->|"Return Results"| F["Validation Report"]
  G["GitHub Actions"] -->|"Run"| B
  G -->|"Run"| D
  G -->|"Upload"| H["Artifacts"]
  F -->|"Display"| H
Loading

Grey Divider

File Changes

1. scripts/visual_test_overlay.py 🧪 Tests +211/-0

Overlay screenshot capture test harness

• New test harness that launches overlay in demo mode and captures screenshots
• Implements 5 test scenarios: empty overlay, demo suggestions, single suggestion, maximum 3
 suggestions, and cleared state
• Uses PySide6 to create overlay window and multiple screenshot tools (spectacle, scrot,
 gnome-screenshot, ImageMagick) with fallback support
• Saves timestamped PNG files to screenshots/ directory for visual review

scripts/visual_test_overlay.py


2. scripts/validate_screenshots.py ✨ Enhancement +408/-0

Claude vision-based screenshot validation script

• New validation script that sends screenshots to Claude API for automated visual validation
• Defines declarative validation criteria for each test scenario (must_have and must_not_have lists)
• Parses Claude's JSON responses to extract pass/fail results, confidence levels, and detailed
 reasoning
• Generates JSON report with summary statistics and per-test validation details
• Includes comprehensive error handling for missing API keys, files, and parsing failures

scripts/validate_screenshots.py


3. .github/workflows/visual-tests.yml ⚙️ Configuration changes +259/-0

GitHub Actions workflow for visual testing

• New GitHub Actions workflow triggered on PRs touching overlay code or manual dispatch
• Sets up virtual display (xvfb) with full graphics stack for headless screenshot capture
• Installs system dependencies (libxcb, libgl, scrot, imagemagick) required for rendering and
 screenshots
• Executes visual tests and optional Claude validation with configurable inputs
• Generates screenshot montage and detailed summary in PR checks with validation report
• Uploads screenshots as artifacts for 30 days and gracefully handles missing API key

.github/workflows/visual-tests.yml


View more (1)
4. docs/plans/visual-test-checklist.md 📝 Documentation +145/-0

Visual testing setup and troubleshooting documentation

• Comprehensive documentation for the automated visual testing workflow
• Includes quick start guide for enabling ANTHROPIC_API_KEY secret and running workflow
• Documents validation criteria table for each test scenario with must-have and must-not-have
 requirements
• Provides manual testing instructions for local KDE Plasma environments
• Includes troubleshooting guide for common issues (missing display, API key, dependencies)

docs/plans/visual-test-checklist.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Capture & Validate Screenshots

Failed stage: Install Python dependencies [❌]

Failed test name: ""

Failure summary:

The action failed during the dependency installation step when pip tried to build dbus-python
(pulled in via python-dbusmock>=0.30).
- dbus-python metadata generation failed (Preparing metadata
(pyproject.toml)), because the Meson build could not find the system dependency dbus-1.
- Meson
error: ../subprojects/dbus-gmain/meson.build:107:11: ERROR: Dependency "dbus-1" not found, tried
pkgconfig and cmake.
- This indicates the runner image is missing the required DBus development
package / pkg-config entry (e.g., libdbus-1-dev), so the wheel cannot be built and pip exits with
code 1.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

666:  Collecting packaging>=22 (from pytest>=7.4->shortcut-sage==0.1.0)
667:  Using cached packaging-26.0-py3-none-any.whl.metadata (3.3 kB)
668:  Collecting pluggy<2,>=1.5 (from pytest>=7.4->shortcut-sage==0.1.0)
669:  Downloading pluggy-1.6.0-py3-none-any.whl.metadata (4.8 kB)
670:  Collecting pygments>=2.7.2 (from pytest>=7.4->shortcut-sage==0.1.0)
671:  Downloading pygments-2.19.2-py3-none-any.whl.metadata (2.5 kB)
672:  Collecting coverage>=7.10.6 (from coverage[toml]>=7.10.6->pytest-cov>=4.1->shortcut-sage==0.1.0)
673:  Downloading coverage-7.13.4-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (8.5 kB)
674:  Collecting dbus-python (from python-dbusmock>=0.30->shortcut-sage==0.1.0)
675:  Downloading dbus-python-1.4.0.tar.gz (232 kB)
676:  Installing build dependencies: started
677:  Installing build dependencies: finished with status 'done'
678:  Getting requirements to build wheel: started
679:  Getting requirements to build wheel: finished with status 'done'
680:  Preparing metadata (pyproject.toml): started
681:  Preparing metadata (pyproject.toml): finished with status 'error'
682:  error: subprocess-exited-with-error
683:  × Preparing metadata (pyproject.toml) did not run successfully.
...

758:  dbus-gmain| Compiler for C supports arguments -Wswitch-default: YES
759:  dbus-gmain| Compiler for C supports arguments -Wswitch-enum: YES (cached)
760:  dbus-gmain| Compiler for C supports arguments -Wundef: YES (cached)
761:  dbus-gmain| Compiler for C supports arguments -Wunused-but-set-variable: YES (cached)
762:  dbus-gmain| Compiler for C supports arguments -Wwrite-strings: YES
763:  dbus-gmain| Compiler for C supports arguments -Wdeclaration-after-statement: YES
764:  dbus-gmain| Compiler for C supports arguments -Wjump-misses-init: YES (cached)
765:  dbus-gmain| Compiler for C supports arguments -Wmissing-prototypes: YES (cached)
766:  dbus-gmain| Compiler for C supports arguments -Wnested-externs: YES (cached)
767:  dbus-gmain| Compiler for C supports arguments -Wold-style-definition: YES (cached)
768:  dbus-gmain| Compiler for C supports arguments -Wpointer-sign: YES (cached)
769:  dbus-gmain| Compiler for C supports arguments -Wstrict-prototypes: YES (cached)
770:  dbus-gmain| Found pkg-config: YES (/usr/bin/pkg-config) 1.8.1
771:  dbus-gmain| Found CMake: /usr/local/bin/cmake (3.31.6)
772:  dbus-gmain| Run-time dependency dbus-1 found: NO (tried pkgconfig and cmake)
773:  ../subprojects/dbus-gmain/meson.build:107:11: ERROR: Dependency "dbus-1" not found, tried pkgconfig and cmake
774:  A full log can be found at /tmp/pip-install-x0msg8zf/dbus-python_e15e1f55741f4c37b4ccd8efa261193f/.mesonpy-4cfn4mf1/meson-logs/meson-log.txt
...

1448:  CMake TRACE: /tmp/pip-install-x0msg8zf/dbus-python_e15e1f55741f4c37b4ccd8efa261193f/.mesonpy-4cfn4mf1/meson-private/__CMake_compiler_info__/CMakeFiles/CMakeScratch/TryCompile-mvBvOd/CMakeLists.txt:28 target_link_libraries(['cmTC_9315a', ''])
1449:  !meson_ci!/ci_include "/tmp/pip-install-x0msg8zf/dbus-python_e15e1f55741f4c37b4ccd8efa261193f/.mesonpy-4cfn4mf1/meson-private/cmake_dbus-1/CMakeMesonToolchainFile.cmake"
1450:  Try CMake generator: auto
1451:  !meson_ci!/ci_include "/tmp/pip-install-x0msg8zf/dbus-python_e15e1f55741f4c37b4ccd8efa261193f/.mesonpy-4cfn4mf1/meson-private/cmake_dbus-1/CMakeLists.txt"
1452:  Calling CMake (['/usr/local/bin/cmake']) in /tmp/pip-install-x0msg8zf/dbus-python_e15e1f55741f4c37b4ccd8efa261193f/.mesonpy-4cfn4mf1/meson-private/cmake_dbus-1 with:
1453:  - "--trace-expand"
1454:  - "--trace-format=json-v1"
1455:  - "--no-warn-unused-cli"
1456:  - "--trace-redirect=cmake_trace.txt"
1457:  - "-DCMAKE_TOOLCHAIN_FILE=/tmp/pip-install-x0msg8zf/dbus-python_e15e1f55741f4c37b4ccd8efa261193f/.mesonpy-4cfn4mf1/meson-private/cmake_dbus-1/CMakeMesonToolchainFile.cmake"
1458:  - "."
1459:  -- Module search paths:    ['/', '/opt', '/usr', '/usr/local']
1460:  -- CMake root:             /usr/local/share/cmake-3.31
1461:  -- CMake architectures:    ['x86_64-linux-gnu']
1462:  -- CMake lib search paths: ['lib', 'lib32', 'lib64', 'libx32', 'share', '', 'lib/x86_64-linux-gnu']
1463:  Preliminary CMake check failed. Aborting.
1464:  Run-time dependency dbus-1 found: NO (tried pkgconfig and cmake)
1465:  ../subprojects/dbus-gmain/meson.build:107:11: ERROR: Dependency "dbus-1" not found, tried pkgconfig and cmake
1466:  ##[endgroup]
1467:  [end of output]
1468:  note: This error originates from a subprocess, and is likely not a problem with pip.
1469:  error: metadata-generation-failed
1470:  × Encountered error while generating package metadata.
1471:  ╰─> dbus-python
1472:  note: This is an issue with the package mentioned above, not pip.
1473:  hint: See above for details.
1474:  ##[error]Process completed with exit code 1.
1475:  Post job cleanup.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot bot commented Feb 12, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (4 files)
  • .github/workflows/visual-tests.yml
  • docs/plans/visual-test-checklist.md
  • scripts/validate_screenshots.py
  • scripts/visual_test_overlay.py

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automated visual-regression workflow for the PySide6 overlay UI by capturing screenshots in CI and (optionally) validating them with Claude vision, with supporting scripts and documentation.

Changes:

  • Added a screenshot capture harness for the overlay across multiple UI states.
  • Added a Claude-vision validator that scores screenshots against scenario-specific criteria and emits a JSON report.
  • Added a GitHub Actions workflow to run screenshot capture + optional validation and publish artifacts/summaries.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
scripts/visual_test_overlay.py Launches the overlay in fallback/demo states and captures scenario screenshots.
scripts/validate_screenshots.py Sends captured screenshots to Claude and parses structured pass/fail JSON results.
.github/workflows/visual-tests.yml CI workflow to run capture under Xvfb, optionally validate via Claude, and upload artifacts/summary.
docs/plans/visual-test-checklist.md Setup + usage + troubleshooting documentation for the visual test workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +100 to +103
- name: Validate screenshots with Claude
id: validate
if: ${{ (inputs.run_claude_validation == true || inputs.run_claude_validation == '') && env.HAS_API_KEY == 'true' }}
run: |
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation step’s if: condition relies on inputs.run_claude_validation == '' to mean “default enabled”, but on pull_request runs inputs.* is not provided (it evaluates as null/undefined, not an empty string), so this condition can evaluate to false and skip validation even when the API key is configured. Consider making the default explicit for non-workflow_dispatch events (e.g., github.event_name != 'workflow_dispatch' || inputs.run_claude_validation) and check the secret directly (secrets.ANTHROPIC_API_KEY != '') instead of piping through env.

Copilot uses AI. Check for mistakes.
Comment thread scripts/visual_test_overlay.py Outdated
Comment on lines +120 to +133
# Test 4: Three suggestions (max)
print("\n5. Test: Maximum suggestions (3)")
three_suggestions = DEMO_SUGGESTIONS + [
{
"action": "tile_right",
"key": "Meta+Right",
"description": "Tile window to right half",
"priority": 50,
}
]
overlay.set_suggestions_fallback(three_suggestions)
app.processEvents()
time.sleep(0.5)
results["04_max_three"] = take_screenshot("04_max_three", screenshots_dir)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “maximum suggestions” scenario only passes exactly 3 suggestions. That doesn’t actually test the max-3 truncation behavior implemented in OverlayWindow.update_suggestions (it slices to suggestions[:3]). To validate truncation, pass 4+ suggestions and ensure the 4th is not rendered (and update Claude criteria to assert it’s absent).

Copilot uses AI. Check for mistakes.
Comment thread scripts/visual_test_overlay.py Outdated
overlay.set_suggestions_fallback(DEMO_SUGGESTIONS)
app.processEvents()
time.sleep(0.5)
results["02_with_suggestions"] = take_screenshot("02_suggestions", screenshots_dir)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result key results["02_with_suggestions"] doesn’t match the test ID/filename (02_suggestions) used elsewhere (validation criteria, screenshot name). This makes the summary output harder to correlate and can confuse consumers if results keys are later reused. Prefer using the same stable test ID as the key (e.g., 02_suggestions).

Suggested change
results["02_with_suggestions"] = take_screenshot("02_suggestions", screenshots_dir)
results["02_suggestions"] = take_screenshot("02_suggestions", screenshots_dir)

Copilot uses AI. Check for mistakes.
Comment thread scripts/validate_screenshots.py Outdated
Comment on lines +263 to +269
for png_file in screenshots_dir.glob("*.png"):
# Extract test ID from filename like "overlay_test_01_empty_20250107_123456.png"
name = png_file.stem
for test_id in VALIDATION_CRITERIA:
if test_id in name:
screenshots[test_id] = png_file
break
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_screenshots() will pick an arbitrary file when multiple screenshots for the same test ID exist (glob order is not guaranteed), which can make local runs nondeterministic if the screenshots directory contains older images. Prefer selecting the most recent match per test ID (e.g., by parsing the timestamp in the filename or using mtime) and/or erroring on duplicates.

Copilot uses AI. Check for mistakes.
Comment thread scripts/validate_screenshots.py Outdated
Comment on lines +225 to +227
# Parse response
response_text = message.content[0].text

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response parsing assumes the first content block is text (message.content[0].text). Anthropic responses can include multiple content blocks; if the first block isn’t text or JSON is split across blocks, this will raise/parse incorrectly. Make this more robust by concatenating all text blocks (or otherwise extracting text blocks) before JSON extraction.

Suggested change
# Parse response
response_text = message.content[0].text
# Parse response: concatenate all text content blocks
text_blocks: list[str] = []
for block in message.content:
# Support both SDK ContentBlock objects and plain dicts
block_type = getattr(block, "type", None) or (block.get("type") if isinstance(block, dict) else None)
if block_type == "text":
text = getattr(block, "text", None)
if text is None and isinstance(block, dict):
text = block.get("text")
if isinstance(text, str):
text_blocks.append(text)
if not text_blocks:
raise ValueError("No text content blocks found in Claude response")
response_text = "\n".join(text_blocks)

Copilot uses AI. Check for mistakes.
Comment thread docs/plans/visual-test-checklist.md Outdated

### GitHub Actions fails to capture screenshots
- Check that `xvfb-run` is installed (should be automatic)
- Verify `QT_QPA_PLATFORM=offscreen` is set
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Troubleshooting recommends setting QT_QPA_PLATFORM=offscreen, but the visual test workflow relies on screenshot tools capturing an X11 display. With offscreen, Qt often won’t create a visible window to capture. Update this guidance to reflect the workflow’s actual requirements (xvfb + xcb/normal platform).

Suggested change
- Verify `QT_QPA_PLATFORM=offscreen` is set
- Verify `QT_QPA_PLATFORM` is **not** set to `offscreen` (leave it unset or set to `xcb` so Qt uses the X11 display)

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/visual-tests.yml Outdated
Comment on lines +87 to +88
QT_QPA_PLATFORM: offscreen
DISPLAY: ':99'
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the capture step, setting QT_QPA_PLATFORM=offscreen will typically prevent Qt from creating an on-screen X11 window, so scrot/import may capture a blank/root image. For screenshot-based testing under xvfb, run with the normal xcb platform (e.g., unset QT_QPA_PLATFORM or set it to xcb) and rely on xvfb-run to provide DISPLAY.

Suggested change
QT_QPA_PLATFORM: offscreen
DISPLAY: ':99'
QT_QPA_PLATFORM: xcb

Copilot uses AI. Check for mistakes.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects bot commented Feb 12, 2026

Code Review by Qodo

🐞 Bugs (6) 📘 Rule violations (2) 📎 Requirement gaps (0)

Grey Divider


Action required

✅ 1. Claude API call unhandled 📘 Rule violation ⛯ Reliability
Description
validate_screenshot_with_claude() calls the Anthropic API without handling network/API exceptions,
so transient failures can crash the script and produce stack traces in CI logs. This reduces
reliability and may expose internal error details to end users of CI output.
Code

scripts/validate_screenshots.py[R198-224]

+    # Call Claude API
+    client = anthropic.Anthropic(api_key=api_key)
+
+    message = client.messages.create(
+        model="claude-sonnet-4-20250514",
+        max_tokens=1024,
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "image",
+                        "source": {
+                            "type": "base64",
+                            "media_type": media_type,
+                            "data": image_data,
+                        },
+                    },
+                    {
+                        "type": "text",
+                        "text": prompt,
+                    },
+                ],
+            }
+        ],
+    )
+
Evidence
PR Compliance ID 3 requires handling external dependency failures with actionable context, and PR
Compliance ID 4 requires avoiding exposing internal details like stack traces to end users. The new
code makes the Claude API call without any surrounding exception handling, and the main loop also
does not catch API-related exceptions, so errors can bubble up unhandled.

Rule 3: Generic: Robust Error Handling and Edge Case Management
Rule 4: Generic: Secure Error Handling
scripts/validate_screenshots.py[198-224]
scripts/validate_screenshots.py[378-384]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scripts/validate_screenshots.py` calls the Anthropic API without exception handling. Network/API errors can crash the script and print stack traces in CI logs.
## Issue Context
This script is intended for CI usage; failures should degrade gracefully with actionable error messages and controlled exit codes.
## Fix Focus Areas
- scripts/validate_screenshots.py[198-224]
- scripts/validate_screenshots.py[378-384]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


✅ 2. Validation skipped on PRs 🐞 Bug ✓ Correctness
Description
The Claude validation step is gated on inputs.run_claude_validation, but inputs is only defined
for workflow_dispatch, so on pull_request runs the condition evaluates false and validation is
skipped even when an API key is configured.
Code

.github/workflows/visual-tests.yml[R100-103]

+      - name: Validate screenshots with Claude
+        id: validate
+        if: ${{ (inputs.run_claude_validation == true || inputs.run_claude_validation == '') && env.HAS_API_KEY == 'true' }}
+        run: |
Evidence
The workflow is configured to run on pull_request, but the validation if: uses inputs.* (only
present for manual dispatch), causing the validation step to not run on PRs.

.github/workflows/visual-tests.yml[3-25]
.github/workflows/visual-tests.yml[100-126]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Claude validation is unintentionally skipped on `pull_request` runs because the step condition depends on `inputs.run_claude_validation`, which is only populated for `workflow_dispatch`. This defeats the stated goal of running visual validation on PRs.
### Issue Context
- The workflow triggers on `pull_request`.
- The validation step’s `if:` condition uses `inputs.run_claude_validation == true || inputs.run_claude_validation == &amp;amp;#x27;&amp;amp;#x27;`.
### Fix Focus Areas
- .github/workflows/visual-tests.yml[3-25]
- .github/workflows/visual-tests.yml[100-130]
### Suggested change (direction)
- Update the validation step `if:` to something like:
- `if: ${{ (github.event_name != &amp;amp;#x27;workflow_dispatch&amp;amp;#x27; || inputs.run_claude_validation) &amp;amp;amp;&amp;amp;amp; secrets.ANTHROPIC_API_KEY != &amp;amp;#x27;&amp;amp;#x27; }}`
- Update the “Skip validation notice” step condition to mirror the new logic (skip when validation disabled on dispatch OR when the secret is missing).
- Prefer checking `secrets.ANTHROPIC_API_KEY != &amp;amp;#x27;&amp;amp;#x27;` in the `if:` rather than relying on step-local env variables.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


✅ 3. HAS_API_KEY used in if 🐞 Bug ⛯ Reliability
Description
The workflow’s if: conditions reference env.HAS_API_KEY, but HAS_API_KEY is only defined
inside step env: blocks; this is fragile and can cause unexpected skipping depending on evaluation
timing. Gate directly on secrets.ANTHROPIC_API_KEY != '' or set HAS_API_KEY at job level.
Code

.github/workflows/visual-tests.yml[R100-131]

+      - name: Validate screenshots with Claude
+        id: validate
+        if: ${{ (inputs.run_claude_validation == true || inputs.run_claude_validation == '') && env.HAS_API_KEY == 'true' }}
+        run: |
+          echo "Running Claude vision validation..."
+
+          # Run validation and capture exit code
+          set +e
+          python scripts/validate_screenshots.py screenshots/ \
+            --output screenshots/validation_report.json
+          VALIDATION_EXIT_CODE=$?
+          set -e
+
+          # Store result for later steps
+          if [ $VALIDATION_EXIT_CODE -eq 0 ]; then
+            echo "validation_passed=true" >> $GITHUB_OUTPUT
+            echo "✅ All visual validations passed!"
+          else
+            echo "validation_passed=false" >> $GITHUB_OUTPUT
+            echo "❌ Some visual validations failed"
+          fi
+
+          exit 0  # Don't fail here, we'll check in a later step
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          HAS_API_KEY: ${{ secrets.ANTHROPIC_API_KEY != '' }}
+
+      - name: Skip validation notice
+        if: ${{ inputs.run_claude_validation == false || env.HAS_API_KEY != 'true' }}
+        env:
+          HAS_API_KEY: ${{ secrets.ANTHROPIC_API_KEY != '' }}
+        run: |
Evidence
env.HAS_API_KEY is used in step if: expressions, but the workflow defines HAS_API_KEY only
inside the step env: sections; this makes gating dependent on how GitHub evaluates expressions vs
step environment setup.

.github/workflows/visual-tests.yml[100-131]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`env.HAS_API_KEY` is referenced in `if:` expressions, but `HAS_API_KEY` is only defined inside the same step’s `env:` block. This is fragile and can lead to the validation step unexpectedly skipping.
### Issue Context
The intention is: run validation only when an API key exists (and optionally when enabled by workflow input).
### Fix Focus Areas
- .github/workflows/visual-tests.yml[100-131]
### Suggested change (direction)
- Prefer:
- `if: ${{ secrets.ANTHROPIC_API_KEY != &amp;amp;#x27;&amp;amp;#x27; &amp;amp;amp;&amp;amp;amp; (github.event_name != &amp;amp;#x27;workflow_dispatch&amp;amp;#x27; || inputs.run_claude_validation) }}`
- If you still want `HAS_API_KEY`, define it at the job level (`jobs.visual-test.env`) so it is available in `if:` contexts consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
✅ 4. Partial capture can pass 🐞 Bug ✓ Correctness
Description
visual_test_overlay.py exits success if it captures at least one screenshot, and
validate_screenshots.py validates whatever subset it finds. CI can go green while missing
scenarios, undermining regression detection.
Code

scripts/visual_test_overlay.py[R192-199]

+        # Return success if at least some screenshots were taken
+        successful = sum(1 for r in results.values() if r is not None)
+        if successful > 0:
+            print(f"\n✓ {successful}/{len(results)} screenshots captured successfully")
+            return 0
+        else:
+            print("\n✗ No screenshots were captured")
+            return 1
Evidence
The harness treats partial success as overall success, and the validator only requires “some
screenshots” rather than all expected scenarios, allowing false-green runs if capture fails
intermittently for one or more scenarios.

scripts/visual_test_overlay.py[192-199]
scripts/validate_screenshots.py[368-372]
scripts/validate_screenshots.py[29-101]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The visual test harness and validator currently allow partial success:
- Screenshot capture exits 0 if *any* screenshot exists.
- Validation proceeds with whatever subset it finds.
This can produce false-green CI runs that don’t actually validate all 5 scenarios.
### Issue Context
The workflow’s stated goal is continuous regression detection across 5 scenarios.
### Fix Focus Areas
- scripts/visual_test_overlay.py[192-199]
- scripts/validate_screenshots.py[29-101]
- scripts/validate_screenshots.py[363-373]
- .github/workflows/visual-tests.yml[69-89]
### Suggested change (direction)
- In `visual_test_overlay.py`, return non-zero unless `successful == len(results)` (or unless all required test IDs succeeded).
- In `validate_screenshots.py`, verify that all keys in `VALIDATION_CRITERIA` exist in `screenshots` and treat missing IDs as failures (exit 1).
- In the workflow, optionally assert `steps.capture.outputs.screenshot_count == 5` before running validation and fail early if not.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

✅ 5. Inconsistent 02_with_suggestions key 📘 Rule violation ✓ Correctness
Description
The results dictionary uses 02_with_suggestions while the test ID and screenshot name are
02_suggestions, making the code less self-documenting and potentially confusing when reading
summaries. This violates the requirement for meaningful, consistent identifier naming.
Code

scripts/visual_test_overlay.py[R106-112]

+        # Test 2: With demo suggestions
+        print("\n3. Test: Overlay with demo suggestions")
+        overlay.set_suggestions_fallback(DEMO_SUGGESTIONS)
+        app.processEvents()
+        time.sleep(0.5)
+        results["02_with_suggestions"] = take_screenshot("02_suggestions", screenshots_dir)
+
Evidence
PR Compliance ID 2 requires identifiers to clearly express intent; using a different key name than
the actual test ID (02_suggestions) reduces clarity and increases the chance of misinterpretation.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
scripts/visual_test_overlay.py[106-112]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The results map uses an inconsistent key name (`02_with_suggestions`) compared to the test ID / filename segment (`02_suggestions`).
## Issue Context
Consistency in identifiers improves readability and reduces confusion when interpreting printed summaries.
## Fix Focus Areas
- scripts/visual_test_overlay.py[106-112]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


✅ 6. Offscreen may hide overlay 🐞 Bug ⛯ Reliability
Description
The capture step sets QT_QPA_PLATFORM=offscreen while using X11 screenshot tools (scrot/import)
that capture the X11 root window; this combination is likely to produce screenshots missing the
overlay (or inconsistent rendering), reducing test reliability.
Code

.github/workflows/visual-tests.yml[R69-89]

+      - name: Run visual tests (capture screenshots)
+        id: capture
+        run: |
+          # Create screenshots directory
+          mkdir -p screenshots
+
+          # Run visual test script under xvfb
+          xvfb-run -a --server-args="-screen 0 1920x1080x24" \
+            python scripts/visual_test_overlay.py
+
+          # List captured screenshots
+          echo "Captured screenshots:"
+          ls -la screenshots/
+
+          # Count screenshots for summary
+          SCREENSHOT_COUNT=$(ls -1 screenshots/*.png 2>/dev/null | wc -l)
+          echo "screenshot_count=$SCREENSHOT_COUNT" >> $GITHUB_OUTPUT
+        env:
+          QT_QPA_PLATFORM: offscreen
+          DISPLAY: ':99'
+
Evidence
The workflow explicitly forces Qt into offscreen mode for the screenshot capture step, while the
harness relies on external screenshot tools that capture from the display server. This is a mismatch
in how the pixels are produced vs captured.

.github/workflows/visual-tests.yml[69-89]
scripts/visual_test_overlay.py[46-52]
docs/plans/visual-test-checklist.md[111-114]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The workflow forces `QT_QPA_PLATFORM=offscreen` during screenshot capture, but the harness captures the X11 root window via scrot/import. This can lead to screenshots that don’t contain the overlay (or contain inconsistent rendering), making the regression signal unreliable.
### Issue Context
- Offscreen Qt is useful for headless *widget tests* (as in existing CI), but screenshot capture relies on display server pixels.
### Fix Focus Areas
- .github/workflows/visual-tests.yml[69-89]
- scripts/visual_test_overlay.py[46-52]
- docs/plans/visual-test-checklist.md[111-114]
### Suggested change (direction)
- Remove `QT_QPA_PLATFORM: offscreen` from the capture step env.
- If needed, explicitly set `QT_QPA_PLATFORM: xcb` for capture.
- Update the troubleshooting docs accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Screenshots not gitignored 🐞 Bug ⛯ Reliability
Description
The new harness writes binary PNGs to a repo-root screenshots/ directory, but .gitignore does
not ignore it, increasing the risk of accidental commits and repo bloat.
Code

scripts/visual_test_overlay.py[R34-38]

+def ensure_screenshot_dir() -> Path:
+    """Create screenshots directory if needed."""
+    screenshots_dir = PROJECT_ROOT / "screenshots"
+    screenshots_dir.mkdir(exist_ok=True)
+    return screenshots_dir
Evidence
The script creates/writes to PROJECT_ROOT/screenshots, and the repository ignore rules currently
do not exclude that directory.

scripts/visual_test_overlay.py[34-38]
.gitignore[1-64]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The visual harness generates `screenshots/*.png` under the repository root, but the repo does not ignore this directory. Developers can accidentally commit large binary artifacts.
### Issue Context
The docs already mention deleting screenshots manually; a gitignore rule prevents mistakes.
### Fix Focus Areas
- scripts/visual_test_overlay.py[34-38]
- .gitignore[1-80]
### Suggested change (direction)
- Add a `.gitignore` entry:
- `screenshots/`
- (Optional) Add a note in docs that screenshots are ignored by git and will be uploaded as CI artifacts when run in Actions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

✅ 8. Nondeterministic screenshot pick 🐞 Bug ⛯ Reliability
Description
find_screenshots() overwrites entries when multiple PNGs match the same test ID, so validation may
run on an arbitrary file if the directory contains leftovers (common in local runs).
Code

scripts/validate_screenshots.py[R259-270]

+def find_screenshots(screenshots_dir: Path) -> dict[str, Path]:
+    """Find and categorize screenshots by test ID."""
+    screenshots: dict[str, Path] = {}
+
+    for png_file in screenshots_dir.glob("*.png"):
+        # Extract test ID from filename like "overlay_test_01_empty_20250107_123456.png"
+        name = png_file.stem
+        for test_id in VALIDATION_CRITERIA:
+            if test_id in name:
+                screenshots[test_id] = png_file
+                break
+
Evidence
The code stores exactly one Path per test_id and overwrites it on later matches without selecting
the newest or warning about duplicates.

scripts/validate_screenshots.py[259-270]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When multiple screenshots exist for the same test ID, the validator silently chooses whichever file happens to be processed last. This can validate stale or unintended screenshots (especially locally).
### Issue Context
`find_screenshots()` maps `{test_id -&amp;amp;gt; Path}` and overwrites on duplicate matches.
### Fix Focus Areas
- scripts/validate_screenshots.py[259-270]
### Suggested change (direction)
- Track a list per test_id and then:
- pick the newest by `stat().st_mtime`, or
- parse the timestamp from the filename and pick the latest, or
- error out if duplicates are found (recommended for CI mode).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread scripts/validate_screenshots.py Outdated
Comment on lines +198 to +224
# Call Claude API
client = anthropic.Anthropic(api_key=api_key)

message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": image_data,
},
},
{
"type": "text",
"text": prompt,
},
],
}
],
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Claude api call unhandled 📘 Rule violation ⛯ Reliability

validate_screenshot_with_claude() calls the Anthropic API without handling network/API exceptions,
so transient failures can crash the script and produce stack traces in CI logs. This reduces
reliability and may expose internal error details to end users of CI output.
Agent Prompt
## Issue description
`scripts/validate_screenshots.py` calls the Anthropic API without exception handling. Network/API errors can crash the script and print stack traces in CI logs.

## Issue Context
This script is intended for CI usage; failures should degrade gracefully with actionable error messages and controlled exit codes.

## Fix Focus Areas
- scripts/validate_screenshots.py[198-224]
- scripts/validate_screenshots.py[378-384]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +100 to +103
- name: Validate screenshots with Claude
id: validate
if: ${{ (inputs.run_claude_validation == true || inputs.run_claude_validation == '') && env.HAS_API_KEY == 'true' }}
run: |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Validation skipped on prs 🐞 Bug ✓ Correctness

The Claude validation step is gated on inputs.run_claude_validation, but inputs is only defined
for workflow_dispatch, so on pull_request runs the condition evaluates false and validation is
skipped even when an API key is configured.
Agent Prompt
### Issue description
Claude validation is unintentionally skipped on `pull_request` runs because the step condition depends on `inputs.run_claude_validation`, which is only populated for `workflow_dispatch`. This defeats the stated goal of running visual validation on PRs.

### Issue Context
- The workflow triggers on `pull_request`.
- The validation step’s `if:` condition uses `inputs.run_claude_validation == true || inputs.run_claude_validation == ''`.

### Fix Focus Areas
- .github/workflows/visual-tests.yml[3-25]
- .github/workflows/visual-tests.yml[100-130]

### Suggested change (direction)
- Update the validation step `if:` to something like:
  - `if: ${{ (github.event_name != 'workflow_dispatch' || inputs.run_claude_validation) && secrets.ANTHROPIC_API_KEY != '' }}`
- Update the “Skip validation notice” step condition to mirror the new logic (skip when validation disabled on dispatch OR when the secret is missing).
- Prefer checking `secrets.ANTHROPIC_API_KEY != ''` in the `if:` rather than relying on step-local env variables.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +100 to +131
- name: Validate screenshots with Claude
id: validate
if: ${{ (inputs.run_claude_validation == true || inputs.run_claude_validation == '') && env.HAS_API_KEY == 'true' }}
run: |
echo "Running Claude vision validation..."

# Run validation and capture exit code
set +e
python scripts/validate_screenshots.py screenshots/ \
--output screenshots/validation_report.json
VALIDATION_EXIT_CODE=$?
set -e

# Store result for later steps
if [ $VALIDATION_EXIT_CODE -eq 0 ]; then
echo "validation_passed=true" >> $GITHUB_OUTPUT
echo "✅ All visual validations passed!"
else
echo "validation_passed=false" >> $GITHUB_OUTPUT
echo "❌ Some visual validations failed"
fi

exit 0 # Don't fail here, we'll check in a later step
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
HAS_API_KEY: ${{ secrets.ANTHROPIC_API_KEY != '' }}

- name: Skip validation notice
if: ${{ inputs.run_claude_validation == false || env.HAS_API_KEY != 'true' }}
env:
HAS_API_KEY: ${{ secrets.ANTHROPIC_API_KEY != '' }}
run: |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Has_api_key used in if 🐞 Bug ⛯ Reliability

The workflow’s if: conditions reference env.HAS_API_KEY, but HAS_API_KEY is only defined
inside step env: blocks; this is fragile and can cause unexpected skipping depending on evaluation
timing. Gate directly on secrets.ANTHROPIC_API_KEY != '' or set HAS_API_KEY at job level.
Agent Prompt
### Issue description
`env.HAS_API_KEY` is referenced in `if:` expressions, but `HAS_API_KEY` is only defined inside the same step’s `env:` block. This is fragile and can lead to the validation step unexpectedly skipping.

### Issue Context
The intention is: run validation only when an API key exists (and optionally when enabled by workflow input).

### Fix Focus Areas
- .github/workflows/visual-tests.yml[100-131]

### Suggested change (direction)
- Prefer:
  - `if: ${{ secrets.ANTHROPIC_API_KEY != '' && (github.event_name != 'workflow_dispatch' || inputs.run_claude_validation) }}`
- If you still want `HAS_API_KEY`, define it at the job level (`jobs.visual-test.env`) so it is available in `if:` contexts consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread scripts/visual_test_overlay.py Outdated
Comment on lines +192 to +199
# Return success if at least some screenshots were taken
successful = sum(1 for r in results.values() if r is not None)
if successful > 0:
print(f"\n✓ {successful}/{len(results)} screenshots captured successfully")
return 0
else:
print("\n✗ No screenshots were captured")
return 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. Partial capture can pass 🐞 Bug ✓ Correctness

visual_test_overlay.py exits success if it captures at least one screenshot, and
validate_screenshots.py validates whatever subset it finds. CI can go green while missing
scenarios, undermining regression detection.
Agent Prompt
### Issue description
The visual test harness and validator currently allow partial success:
- Screenshot capture exits 0 if *any* screenshot exists.
- Validation proceeds with whatever subset it finds.
This can produce false-green CI runs that don’t actually validate all 5 scenarios.

### Issue Context
The workflow’s stated goal is continuous regression detection across 5 scenarios.

### Fix Focus Areas
- scripts/visual_test_overlay.py[192-199]
- scripts/validate_screenshots.py[29-101]
- scripts/validate_screenshots.py[363-373]
- .github/workflows/visual-tests.yml[69-89]

### Suggested change (direction)
- In `visual_test_overlay.py`, return non-zero unless `successful == len(results)` (or unless all required test IDs succeeded).
- In `validate_screenshots.py`, verify that all keys in `VALIDATION_CRITERIA` exist in `screenshots` and treat missing IDs as failures (exit 1).
- In the workflow, optionally assert `steps.capture.outputs.screenshot_count == 5` before running validation and fail early if not.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Fixes from PR #8 review:

CI/Workflow fixes:
- Add libdbus-1-dev to system dependencies
- Install Python deps without dbus extra to avoid build failures
- Change QT_QPA_PLATFORM from offscreen to xcb for xvfb compatibility
- Fix workflow condition to check secrets directly instead of env.HAS_API_KEY
- Require all 5 screenshots (fail if incomplete)
- Upload artifacts even on failure (if: always())

Screenshot capture fixes:
- Fix key naming inconsistency (02_suggestions consistently)
- Add 4th suggestion to test truncation (4 input → 3 displayed)
- Require ALL screenshots for success exit code
- Reorder screenshot tools (scrot first for xvfb compatibility)

Claude validation fixes:
- Add proper error handling with retries and exponential backoff
- Handle APIConnectionError, RateLimitError, APIStatusError separately
- Extract text from all content blocks (not just first)
- Select most recent screenshot when duplicates exist
- Add app context to validation prompt
- Update truncation test to check for Meta+Down absence

Development tooling:
- Add justfile with common dev commands
- Add screenshots/ to .gitignore
- Create docs/HOOKS.md with proposed Claude Code hooks

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


---

**Last Updated**: 2025-01-07
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Last Updated" date is set to "2025-01-07", which is in the past relative to the current date (February 14, 2026). Consider updating this to the current date or removing the static date in favor of using Git commit timestamps for tracking when documentation was last modified.

Suggested change
**Last Updated**: 2025-01-07
**Last Updated**: See Git history for modification dates.

Copilot uses AI. Check for mistakes.

- name: Validate screenshots with Claude
id: validate
if: ${{ (github.event_name == 'workflow_dispatch' && inputs.run_claude_validation == true) || (github.event_name == 'pull_request' && secrets.ANTHROPIC_API_KEY != '') }}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checking secrets.ANTHROPIC_API_KEY != '' will always evaluate to false when the secret doesn't exist, as GitHub Actions treats non-existent secrets as empty strings. However, the expression secrets.ANTHROPIC_API_KEY != '' in a conditional context actually checks if the secret has a non-empty value, which is the intended behavior. This is correct, but for clarity and better practice, consider using secrets.ANTHROPIC_API_KEY alone, which evaluates to true if the secret exists and has a value.

Suggested change
if: ${{ (github.event_name == 'workflow_dispatch' && inputs.run_claude_validation == true) || (github.event_name == 'pull_request' && secrets.ANTHROPIC_API_KEY != '') }}
if: ${{ (github.event_name == 'workflow_dispatch' && inputs.run_claude_validation == true) || (github.event_name == 'pull_request' && secrets.ANTHROPIC_API_KEY) }}

Copilot uses AI. Check for mistakes.
if: always()
uses: actions/upload-artifact@v4
with:
name: overlay-screenshots-${{ github.sha }}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artifact name uses ${{ github.sha }} which will create a very long artifact name (e.g., "overlay-screenshots-abc123def456..."). GitHub Actions allows artifact names up to 256 characters, so this should work, but consider using a shorter suffix like ${{ github.run_number }} or ${{ github.run_id }} for better readability in the artifacts list, or simply "overlay-screenshots" if uniqueness per run isn't critical (newer uploads will overwrite older ones with the same name).

Suggested change
name: overlay-screenshots-${{ github.sha }}
name: overlay-screenshots-${{ github.run_number }}

Copilot uses AI. Check for mistakes.
Documentation updates:
- README.md: Add visual testing section, justfile commands, workflow badge
- visual-test-checklist.md: Fix QT_QPA_PLATFORM from offscreen→xcb, clarify truncation test
- PR-05-overlay-checklist.md: Add automated testing deliverables, update task status
- docs/updates/README.md: Add 2026-01-07 status entry for visual testing work

Consistency fixes:
- Changed all references from offscreen to xcb platform
- Clarified test 04_max_three tests truncation (4 input → 3 displayed)
- Added missing scrot/imagemagick requirements to troubleshooting
- Updated project status to reflect completed automated testing

No functionality changes - documentation only.
@Coldaine Coldaine merged commit 4d46915 into master Feb 16, 2026
2 of 4 checks passed
Coldaine pushed a commit that referenced this pull request Feb 16, 2026
Fixes from PR #8 review:

CI/Workflow fixes:
- Add libdbus-1-dev to system dependencies
- Install Python deps without dbus extra to avoid build failures
- Change QT_QPA_PLATFORM from offscreen to xcb for xvfb compatibility
- Fix workflow condition to check secrets directly instead of env.HAS_API_KEY
- Require all 5 screenshots (fail if incomplete)
- Upload artifacts even on failure (if: always())

Screenshot capture fixes:
- Fix key naming inconsistency (02_suggestions consistently)
- Add 4th suggestion to test truncation (4 input → 3 displayed)
- Require ALL screenshots for success exit code
- Reorder screenshot tools (scrot first for xvfb compatibility)

Claude validation fixes:
- Add proper error handling with retries and exponential backoff
- Handle APIConnectionError, RateLimitError, APIStatusError separately
- Extract text from all content blocks (not just first)
- Select most recent screenshot when duplicates exist
- Add app context to validation prompt
- Update truncation test to check for Meta+Down absence

Development tooling:
- Add justfile with common dev commands
- Add screenshots/ to .gitignore
- Create docs/HOOKS.md with proposed Claude Code hooks

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants