Skip to content

Keep esphome.components out of the dashboard's main process#1493

Merged
bdraco merged 13 commits into
mainfrom
drop-esp32-cold-start-import
Jun 16, 2026
Merged

Keep esphome.components out of the dashboard's main process#1493
bdraco merged 13 commits into
mainfrom
drop-esp32-cold-start-import

Conversation

@bdraco

@bdraco bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member

What does this implement/fix?

The dashboard process eagerly imported esphome.components.esp32, which pulls in espidf, requests and esphome.config; on a slow SBC like the HA Green that was around 9s of cold start before the first log line, and those modules stayed resident after the first firmware download.

Static platform data (esp32 variants, native wifi capability, download routing, and the download types for esp32, esp8266 and rp2040) now ships in a nightly generated platform_capabilities.index.json that the dashboard reads at runtime with no esphome import. The platforms that read the build dir at download time (libretiny, nrf52) are answered by a new device-builder-helper subprocess, so the dashboard process never imports esphome.components. A ported import time budget check (script/check_import_time.py) fails CI if a fresh eager import creeps back.

Behaviour is unchanged; tests pin that the index and the helper produce the same output as the in-process esphome calls they replace, across every platform.

Related issue or feature (if applicable):

  • N/A

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed
  • Companion frontend PR: esphome/device-builder-frontend#

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.index.json / definitions/components/*.json have not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

The dashboard process eagerly imported esphome.components.esp32, which pulls
in espidf, requests and esphome.config; on a slow SBC like the HA Green that
was roughly 0.5 to 1.5s of cold start before the first log line, and the
modules stayed resident after the first firmware download.

Static platform data (esp32 variants, native wifi capability, download
routing, and the download types for esp32, esp8266 and rp2040) now ships in a
nightly generated platform_capabilities.index.json that the dashboard reads
at runtime with no esphome import. The platforms that read the build dir at
download time (libretiny, nrf52) are answered by a new device-builder-helper
subprocess, so the dashboard process never imports esphome.components. A
ported import time budget check guards against a fresh eager import creeping
back.
@bdraco bdraco added the enhancement Improvement to an existing feature label Jun 16, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 16, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 27 untouched benchmarks


Comparing drop-esp32-cold-start-import (4d429bc) with main (a703d69)

Open in CodSpeed

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.54%. Comparing base (a703d69) to head (4d429bc).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1493      +/-   ##
==========================================
+ Coverage   99.50%   99.54%   +0.03%     
==========================================
  Files         222      223       +1     
  Lines       17318    17418     +100     
==========================================
+ Hits        17233    17338     +105     
+ Misses         85       80       -5     
Flag Coverage Δ
py3.12 99.51% <100.00%> (+0.03%) ⬆️
py3.14 99.54% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...me_device_builder/controllers/firmware/download.py 100.00% <100.00%> (ø)
...ome_device_builder/controllers/firmware/helpers.py 96.29% <100.00%> (ø)
...ollers/remote_build/artifact_platforms/__init__.py 100.00% <100.00%> (ø)
...lder/controllers/remote_build/artifacts_tarball.py 100.00% <100.00%> (ø)
esphome_device_builder/definitions/__init__.py 98.66% <100.00%> (+0.31%) ⬆️
esphome_device_builder/helper_cli.py 100.00% <100.00%> (ø)
...ome_device_builder/helpers/device_yaml/__init__.py 100.00% <ø> (ø)
..._device_builder/helpers/device_yaml/_generation.py 100.00% <100.00%> (+4.67%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see the review comment above.

bdraco added 2 commits June 16, 2026 10:56
The CI matrix runs newer esphome (beta / dev) with extra esp32 variants, so
asserting the committed index equals the installed esphome failed. Routing
tests now drive off the index (what routing actually uses), and the index test
asserts the committed data is a subset of the installed esphome rather than
equal; exact parity stays the sync workflow's job.

Review cleanups: cache load_platform_capabilities_index so the several import
time callers share one parse; pass close_fds=False to the helper subprocess to
match helpers.subprocess; reuse _find_sibling_cli for the helper argv (it
resolves sys.executable); read/write helper JSON through helpers.json (orjson)
instead of stdlib json; return a _DownloadRouting dataclass from _platform_sets
instead of a positional tuple. Add in process helper_cli coverage, loader error
branch tests, a wifi parity check that our inference agrees with upstream on the
inputs the index covers, and run the golden helper e2e through _helper_cmd so
the installed console-script entry point is exercised under CI.
Keep the helper's stdout pure JSON: route anything esphome prints to stdout
during the component import / get_download_types to stderr, so a banner or
deprecation notice can't make the parent's parse choke and silently drop
artifacts.

Split the parent-side error handling so an infrastructure failure (helper not
installed, timeout, esphome import error) logs the child's stderr and is
diagnosable, distinct from a non-JSON reply; both still degrade to [] so the
device listing keeps rendering. Note in check_import_time that the absolute
budget should be regenerated on the CI runner class, with the structural
cold-import test as the authoritative guard.
@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks, addressed in the follow-up commits:

  • Stdout pollution / fragile parse: the helper now wraps the component import + get_download_types in contextlib.redirect_stdout(sys.stderr), so its stdout is pure JSON; a banner or deprecation notice on stdout can't corrupt the parent's parse.
  • Helper failure indistinguishable from "no artifacts": parent-side handling is split so an infrastructure failure (spawn / timeout / import error) logs the child's stderr and is diagnosable, separate from a non-JSON reply. Both still degrade to [] on purpose; get_binaries must keep rendering the rest of the listing, which the existing tests pin.
  • Tarball severity: behaviour is preserved; the prior in-process path also caught the failure and returned [], shipping the tarball with its BUILD_FILES (the firmware binaries travel regardless). The shared helper logs at warning with exc_info + child stderr, so it's still visible. Raising would be new behaviour that fails an otherwise-valid offload.
  • Per-call subprocess for libretiny/nrf52: intentional trade and limited to those two build-dir-dependent platforms; esp32 / esp8266 / rp2040 are answered from the precomputed index with no spawn, and the call runs in the executor, off the loop. No cache since get_download_types reads the build dir.
  • Absolute-us budget flake: documented in check_import_time.py that the budget is regenerated on the CI runner class, with test_cold_import_floor.py (the heavy modules stay out of sys.modules) as the authoritative structural guard.
  • Dropped download key: verified against the frontend; FirmwareBinary is {title, file, description?, type?} and the UI keys on file (the saved filename comes from the download endpoint's Content-Disposition), so download was never consumed.

Also note the index-vs-esphome test is now a subset check rather than equality, since the CI matrix runs newer esphome (beta / dev) with extra esp32 variants.

@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Run the device-builder-helper JSON reply through the same coercion the generated
index uses (coerce_download_entries), so a malformed payload (non-list, entries
without a string file) can't reach a downstream entry["file"]; it drops to [] or
the well-shaped subset. With coercion total, the parse guard narrows from a
catch-all to JSONDecodeError.
@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks. This review is against cbdc547; the two suggestions were already in 487990ca, and the deserialization point is now handled too:

  • Suggestion 1 (stdout pollution): done in 487990ca; the helper wraps the import + get_download_types in contextlib.redirect_stdout(sys.stderr), so stdout is reserved for JSON.
  • Suggestion 2 (discarded diagnostics): done in 487990ca; the spawn/timeout branch logs getattr(err, "stderr", None) and the parse branch logs result.stdout/result.stderr, so a broken helper is diagnosable rather than a bare [].
  • Silent Failure 2 (unvalidated deserialization): fixed in eb7154f7; the helper reply now runs through coerce_download_entries (the same validator the index payload uses), so a non-list or an entry without a string file is dropped at the boundary and can never reach a downstream entry["file"]. With coercion total, the parse guard narrowed from a catch-all to JSONDecodeError.
  • Silent Failure 1 (transient vs genuine-empty): kept as [] on purpose. get_binaries feeds the device listing, which must keep rendering for the other devices; the prior in-process path degraded the same way. Transient failures are now logged with the child's stderr, so they're visible to an operator without changing the response contract.

CI is green-pending on eb7154f7.

The sys.modules invariant check has to run in a fresh interpreter, but the
inline textwrap script wasn't linted or type-checked. Move it to
tests/_probe_download_no_components.py and run it with the repo root on the
child's PYTHONPATH so it imports this checkout's source.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes esphome.components.* imports from the dashboard’s main process by shifting platform metadata to a generated JSON index and routing build-dir-dependent download-type resolution through a short-lived helper subprocess, plus adds CI guardrails to prevent import-time regressions.

Changes:

  • Add a generated platform_capabilities.index.json and runtime loader to resolve ESP32 variants, Wi-Fi capability, and static download types without importing ESPHome components.
  • Introduce device-builder-helper subprocess for download-type queries that depend on the build directory (e.g., LibreTiny / nrf52).
  • Add import-time budget checking in CI and update/extend tests to pin both behavior and “no heavy imports in the main process” invariants.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_resolve_download_component.py Stops importing ESPHome platform constants; derives routing sets from the generated index.
tests/test_remote_build_artifacts_download.py Updates artifact-pack tests to exercise helper/subprocess failure paths and new _download_type_files signature.
tests/test_platform_capabilities.py New tests validating the generated platform-capabilities index parsing and containment vs installed ESPHome.
tests/test_helper_cli.py New tests for device-builder-helper and for ensuring download resolution doesn’t import esphome.components.* in the parent process.
tests/test_device_yaml.py Updates Wi-Fi capability inference tests to use index-based _has_native_wifi and adds parity assertions vs upstream for indexed inputs.
tests/test_cold_import_floor.py Extends “cold import” module allowlist to include esp32/espidf/wifi modules.
tests/e2e/slow/esp32/test_esp_idf_compile_download.py Updates calls to collect_download_entries to pass storage_path.
tests/controllers/firmware/test_get_binaries.py Refactors tests to cover index-vs-helper split (_download_types_for) and updates signatures/behavior expectations.
script/sync_components.py Adds emission of platform_capabilities.index.json during sync.
script/import_time_budget.json Adds baseline import-time budget for CI gating.
script/check_import_time.py New script to measure and enforce import-time budget using importtime_waterfall.
pyproject.toml Adds importtime-waterfall test dep and registers device-builder-helper console script.
esphome_device_builder/helpers/device_yaml/_generation.py Switches native-Wi-Fi inference to index-based data (no esphome.components.wifi import).
esphome_device_builder/helpers/device_yaml/init.py Removes exports for now-deleted upstream/fallback Wi-Fi helper selection internals.
esphome_device_builder/helper_cli.py New helper CLI module that imports ESPHome components in a subprocess and prints JSON to stdout.
esphome_device_builder/definitions/platform_capabilities.index.json New committed platform-capabilities index artifact.
esphome_device_builder/definitions/init.py Adds PlatformCapabilities type + loader/coercers for the new index.
esphome_device_builder/controllers/remote_build/artifacts_tarball.py Routes download-type file inclusion through firmware download resolver (index/helper), removing ESPHome component imports.
esphome_device_builder/controllers/remote_build/artifact_platforms/init.py Resolves ESP32 variants via the generated index (no esphome.components.esp32 import).
esphome_device_builder/controllers/firmware/helpers.py Extends sibling-CLI discovery to support differing console-script name vs module path.
esphome_device_builder/controllers/firmware/download.py Implements index-backed + helper-backed download-type resolution; updates APIs to accept storage_path.
.github/workflows/test.yml Adds import-time budget check step and uploads the HAR waterfall artifact.
Comments suppressed due to low confidence (1)

tests/test_helper_cli.py:157

  • In the embedded subprocess script, StorageJSON(firmware_bin_path=...) is given a Path object, but other fixtures in this file pass str(...). Keeping it as a string avoids potential JSON-serialization issues inside StorageJSON.save() and keeps the fixture consistent.
    )
    assert result.returncode == 0, f"leaked:\n{result.stdout}\nstderr:\n{result.stderr}"

Comment thread tests/controllers/firmware/test_get_binaries.py Outdated
Comment thread script/sync_components.py
_parse_download_types collapses to a dict comprehension with the non-list
filter. The artifacts_tarball -> firmware.download import stays function-local;
a top-level import is a circular import (firmware/__init__ -> controller ->
remote_runner -> helpers.remote_artifacts_materialise -> back into
artifacts_tarball), now noted in a comment so it isn't re-attempted.
@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see the review comment above.

bdraco added 2 commits June 16, 2026 11:16
Add the cold-start section to docs/ARCHITECTURE.md and the generated index,
helper entry point, import budget, and the no-esphome.components invariant to
CLAUDE.md.

Hoist the firmware_bin_path-is-None short-circuit above _download_types_for so
an unbuilt libretiny/nrf52 device returns [] without spawning the helper. Shape
the helper's reply through coerce_download_entries (tolerates a malformed entry
instead of a KeyError aborting the whole reply), and log when coercion drops
entries so a silently-dropped artifact is diagnosable.
@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks, all addressed in f637ffc4:

  • Architecture docs (gating): added a "Cold-start import discipline" section to docs/ARCHITECTURE.md (the invariant, the snapshot index, and the helper subprocess), plus rows for platform_capabilities.index.json, the device-builder-helper entry point, and check_import_time.py in CLAUDE.md, with a conventions bullet stating the long-lived process never imports esphome.components.*. Docs checkbox ticked.
  • Suggestion 1 (spawn before short-circuit): hoisted the firmware_bin_path is None check above _download_types_for, so an unbuilt libretiny/nrf52 device returns [] without spawning the helper or logging a spurious warning.
  • Silent Failure 2 (unguarded entry["file"] in the child): the helper now shapes its reply through the shared coerce_download_entries, so a single entry missing file is skipped rather than aborting the whole subprocess (this also dedups the shaping the parent already used).
  • Silent Failure 1 (silent drops in coerce_download_entries): it now logs a warning when a non-empty input yields fewer entries, so a dropped artifact is diagnosable.

@bdraco bdraco marked this pull request as ready for review June 16, 2026 16:25
@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Validate the helper's component argument against [a-z0-9_]+ before interpolating
it into import_module, so a crafted target_platform can't steer the import to a
dotted sub-path (defence in depth; the name is already confined to the
esphome.components. prefix and import_module is not eval).

Fold any esp32-prefixed target_platform to the esp32 component even when the
index is degraded (empty variants), so a missing index makes an ESP32 variant
download slow rather than broken. Log the non-list branch of
coerce_download_entries so a wholesale-malformed helper reply is diagnosable.
@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks. Addressed in 93bcdc7f:

  • Suggestion 1 (ESP32 variants break on a degraded index): _resolve_download_component now folds any esp32-prefixed target_platform to the esp32 component as a fallback, so a missing index makes an ESP32-S3/C3/... download slow (helper spawn that imports esphome.components.esp32) rather than broken (importing a nonexistent esphome.components.esp32s3). Symmetric with the base chip now.
  • Silent Failure 3 (non-list branch unlogged): coerce_download_entries logs a warning on the non-list branch too, so a structurally-wrong helper reply is diagnosable, consistent with the per-entry drop logging.
  • Silent Failure 1 (corrupt index fail-open): kept as graceful degradation. The index ships generated inside the wheel, so "corrupt" means a broken deploy; raising at startup would brick the whole dashboard over a download-listing concern. Missing logs a warning, unparseable logs an exception (already louder).
  • Silent Failure 2 (infra failure → []): kept as [] by design, as in prior rounds. get_binaries feeds the device listing, which must keep rendering for other devices; surfacing a new "unavailable" state is a frontend-coordinated change out of scope here. The child's stderr is logged so it's diagnosable.

Also added defence-in-depth: the helper validates its component argument against [a-z0-9_]+ before import_module, so a crafted target_platform can't steer the import path.

The 403ms baseline was measured on a dev laptop; CI runners read ~3x slower
(1194ms observed), so the 15% margin flaked. Rebaseline to the CI figure with a
50% margin (and bump the script default to match) so unrelated PRs don't trip the
gate; the structural cold-import test stays the precise invariant.
@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks:

  • Suggestion 1 (import budget flake): already rebaselined in 8ca5c974 to the CI-measured figure (~1194ms) with a 50% margin (and the script default bumped to match), since CI runners read ~3x slower than a laptop. Kept it blocking rather than advisory, with test_cold_import_floor.py documented as the authoritative guard.
  • Suggestion 2 (cached list by reference): fixed in a52936e2; the precomputed branch now returns [dict(entry) for entry in precomputed], so a future mutating caller can't corrupt the cached index, and it matches the helper path which already returns fresh dicts.
  • Silent Failure 1 (empty component returns [] without log): the docstring was stale; the empty-component short-circuit deliberately returns [] with no spawn and no log (it fires on every never-built device, so a warning would be noise). Fixed the docstring to match rather than adding a log.
  • Silent Failure 2 (infra failure vs no-artifacts): kept as [] by design, as in prior rounds; the listing must keep rendering and the child stderr is logged for diagnosis. A typed error to the frontend is a coordinated change out of scope here.
  • Silent Failure 3 (fail-open on missing index): kept as graceful degradation. The index ships generated in the wheel, so absence means a broken build; raising at startup would brick the whole dashboard over a download-listing concern.

@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Tested with every board type download so should be good to go now

@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

build_files_for_platform now folds any esp32-prefixed target_platform to the
esp32 BUILD_FILES even on a degraded index, matching _resolve_download_component
so an offloaded ESP32 variant packs rather than raising on empty build_files.

Make collect_download_entries' storage_path a required positional (every caller
already passes it), so a future omitting caller fails mypy rather than silently
returning [] for libretiny/nrf52.
@bdraco

bdraco commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks. Both suggestions taken in 4d429bcf:

  • Suggestion 1 (degraded-index esp32 fold divergence): build_files_for_platform now folds any esp32-prefixed target_platform to the esp32 BUILD_FILES even on an empty index, mirroring _resolve_download_component, so an offloaded ESP32 variant packs instead of raising. Added a test.
  • Suggestion 2 (storage_path optional): made it a required positional on collect_download_entries; every caller already passes it, so a future omitting caller now fails mypy rather than silently returning [].

On the silent-failure notes:

  • SF1 (infra failure vs no-artifacts): kept as [] by design, as in prior rounds; the listing must keep rendering and the child stderr is logged. A typed wire state is a frontend-coordinated change out of scope here.
  • SF2 (incomplete tarball): the firmware artifacts (firmware.factory.bin, etc.) are already in each platform's static BUILD_FILES (as your Suggestion 1 note observes), so a [] from _download_type_files can't drop them; this matches the prior in-process path which also returned [] on failure.
  • SF3 (fail-open wifi → invalid config): the wifi inference is only the fallback when a board manifest omits connectivity (curated boards carry it), the index ships generated in the wheel (CI-verified in sync), and crucially the wizard's generated YAML is validated through esphome config before write, so a wrong wifi block would fail at create (INTERNAL_ERROR), not land silently. So "never generate invalid configs" is still backstopped.

@bdraco bdraco merged commit c961ea6 into main Jun 16, 2026
21 checks passed
@bdraco bdraco deleted the drop-esp32-cold-start-import branch June 16, 2026 17:19
@esphbot

esphbot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

PR Review — Keep esphome.components out of the dashboard's main process

Solid, well-scoped change that delivers exactly what the description promises — no blocking issues.

Strengths:

  • The static-vs-dynamic split is correct: esp32/esp8266/rp2040 get_download_types are genuinely build-dir-independent (only the unused download field varies by name), so snapshotting them into platform_capabilities.index.json is safe; libretiny/nrf52 read the build dir and are correctly routed to the helper subprocess.

  • Boundary hardening is thorough: coerce_download_entries validates both the generated index and the helper reply so a malformed payload can't reach a downstream entry["file"]; the helper isolates its stdout via redirect_stdout(sys.stderr) so a banner can't corrupt the parse; spawn/timeout/import failures are logged with child stderr and degrade to [].

  • The degraded-index fallbacks are symmetric and defensible — _resolve_download_component and build_files_for_platform both fold any esp32* target to esp32, and the fail-open wifi inference is genuinely backstopped because template-generated YAML is validated through esphome config (validate_rewritten_yaml_or_raise) before write.

  • Subprocess work runs in run_in_executor, so the 60s helper never blocks the event loop. CI guards (check_import_time.py + test_cold_import_floor.py) and docs (ARCHITECTURE.md, CLAUDE.md) are in place.

  • One non-blocking note: libretiny/nrf52 get_binaries spawns a fresh esphome-importing child per call — fine on-demand, flagged as a possible memoization follow-up.


🟢 Suggestions

1. Each get_binaries on a libretiny/nrf52 device spawns a fresh ~10s esphome-importing subprocess (`esphome_device_builder/controllers/firmware/download.py`, L158-170)

For libretiny/nrf52 devices, _download_types_for spawns the device-builder-helper child on every call, and that child pays the full ESPHome import (the ~9s cold-start cost this PR is removing from the main process) before answering.

get_binaries is on-demand (user opens the download panel), so the latency lands on a user action rather than a hot loop — and it only affects the libretiny/nrf52 minority; the dominant esp32/esp8266/rp2040 path is an instant index read. So this is acceptable as shipped.

Worth confirming it's the intended trade-off: re-opening the download panel on the same libretiny device re-spawns the child each time. If that proves slow in practice, a small memoization keyed on (storage_path, build_dir mtime) would collapse repeats without re-importing ESPHome — but it's a follow-up, not a blocker, since the result legitimately depends on on-disk build state.

cmd = [*_helper_cmd(), "download-types", str(storage_path), component]

Checklist

  • No command/shell injection (subprocess argv internally built, no shell)
  • Unsafe deserialization (JSON parse guarded, coerced at boundary)
  • Error handling degrades safely without swallowing diagnostics
  • No event-loop blocking from subprocess (runs in executor)
  • Backward-compatible get_binaries contract preserved
  • Cached index not mutated by callers (returns dict copies)
  • Test coverage for index/helper parity and failure paths
  • Performance: per-call helper spawn for dynamic platforms — suggestion #1

Silent Failure Analysis

🟡 **MEDIUM** — silent empty return masks incomplete artifact (`esphome_device_builder/controllers/remote_build/artifacts_tarball.py:287-300`)

Risk: When the helper subprocess fails for a build-dir-dependent platform (libretiny/nrf52), _download_types_for returns [], so secondary build files (e.g. factory/OTA bins) are silently dropped from the offload tarball with no error surfaced at the build level — producing a quietly incomplete artifact sent to a remote builder.

entries = _download_types_for(storage, storage_path, label=storage.name)
return [entry["file"] for entry in entries]

Fix: Distinguish 'platform has no download types' from 'helper failed to resolve them' and raise (or at least warn at the tarball boundary) when files can't be resolved for a build-dir-dependent platform, rather than packing a partial tarball.

🟡 **MEDIUM** — fallback value hides failure (`esphome_device_builder/controllers/firmware/download.py:195-210`)

Risk: A helper infrastructure failure (not installed, timeout, import error) for a libretiny/nrf52 device degrades to [], which the listing renders identically to an unbuilt device, so a real failure is invisible to the user even though it is logged server-side.

except Exception as err:  # spawn / nonzero exit / timeout / esphome regression
    _LOGGER.warning("download-types helper failed for %s: %s", ...)
    return []

Fix: Consider surfacing the helper failure to the client (e.g. an explicit error/diagnostic field) instead of collapsing it into the same empty result a genuinely-unbuilt device produces.

🟡 **MEDIUM** — fail-open fallback hides failure (`esphome_device_builder/definitions/__init__.py:480-497`)

Risk: A missing or malformed index degrades silently to empty lists, so esp32_no_wifi_variants / rp2040_no_wifi_boards are empty and no-wifi boards are inferred as wifi-capable — a fail-open that can generate invalid wifi config, contradicting the project's 'never generate invalid configs' invariant (documented as intentional, but the correctness risk is worth a reviewer's eye).

except Exception:
    _LOGGER.exception("Failed to load platform_capabilities.index.json — degraded.")
    return empty

Fix: For the wifi-inference consumers, prefer failing closed (or emitting a louder, user-visible signal) when the capabilities index is unavailable rather than silently assuming wifi support.


Automated review by Kōan (Claude) HEAD=4d429bc 6 min 8s

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants