refactor: consolidate duplicate helpers across pkg/workflow, pkg/cli, pkg/parser#39955
refactor: consolidate duplicate helpers across pkg/workflow, pkg/cli, pkg/parser#39955Copilot wants to merge 6 commits into
Conversation
… pkg/parser Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Test Quality Sentinel completed test quality analysis. |
There was a problem hiding this comment.
Pull request overview
This PR refactors gh-aw by consolidating previously duplicated helper logic across pkg/workflow, pkg/cli, and pkg/parser into shared utilities (notably pkg/stringutil and pkg/sliceutil), and by standardizing a common “seen set” pattern (map[string]struct{}) throughout the codebase.
Changes:
- Centralized GitHub Actions expression detection helpers in
pkg/stringutiland updated workflow code to delegate accordingly. - Introduced
sliceutil.DeduplicateTrimmed([]string)and swept many ad-hoc “trim + drop-empty + dedup (stable)” loops to use it (and generally migratedmap[string]bool→map[string]struct{}). - Extracted shared Node.js execution scaffolding for CLI log parsers into
pkg/cli/logs_parsing_runner.go.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/template.go | Switch env-var dedup tracking to a struct{} set. |
| pkg/workflow/shell.go | Switch expandable-var dedup tracking to a struct{} set. |
| pkg/workflow/safe_update_manifest.go | Switch multiple dedup sets to struct{} and keep deterministic container/action aggregation. |
| pkg/workflow/runtime_import_validation.go | Switch runtime import-path dedup tracking to a struct{} set. |
| pkg/workflow/run_step_sanitizer.go | Switch expression dedup tracking to a struct{} set. |
| pkg/workflow/publish_assets.go | Remove local ${{ }} detector and reuse workflow’s shared expression helper. |
| pkg/workflow/package_extraction.go | Switch package dedup tracking to a struct{} set across extraction paths. |
| pkg/workflow/expression_patterns.go | Delegate expression helper functions to new stringutil exports. |
| pkg/workflow/expression_extraction.go | Switch sub-expression dedup tracking to a struct{} set. |
| pkg/workflow/config_helpers.go | Replace manual string lookup with typeutil.LookupString delegation. |
| pkg/workflow/compiler_orchestrator_workflow.go | Switch OTLP endpoint URL dedup tracking to a struct{} set. |
| pkg/workflow/compiler_activation_job.go | Switch centralized “on:” event dedup tracking to a struct{} set. |
| pkg/workflow/checkout_manager.go | Replace duplicated “trim + dedup” merges with sliceutil.DeduplicateTrimmed. |
| pkg/workflow/awf_helpers.go | Switch exclude env-var name dedup tracking to a struct{} set. |
| pkg/workflow/awf_config.go | Use sliceutil.DeduplicateTrimmed for domain-list parsing. |
| pkg/stringutil/stringutil.go | Add exported expression marker/expression detection helpers. |
| pkg/sliceutil/sliceutil.go | Add DeduplicateTrimmed([]string) helper (trim, drop empty, stable dedup). |
| pkg/parser/tools_merger.go | Switch allowed-array dedup tracking to a struct{} set. |
| pkg/parser/schema_suggestions.go | Switch schema-field-location dedup tracking to struct{} sets. |
| pkg/parser/schema_errors.go | Use sliceutil.Deduplicate for wantTypes; switch suggestion dedup to struct{} set. |
| pkg/parser/import_field_extractor.go | Switch observability endpoint dedup tracking to a struct{} set. |
| pkg/parser/frontmatter_hash.go | Switch template-expression dedup tracking to a struct{} set. |
| pkg/cli/secrets.go | Switch secret extraction dedup tracking to a struct{} set. |
| pkg/cli/packages.go | Switch include recursion “seen” tracking to map[string]struct{} and update signature. |
| pkg/cli/packages_test.go | Update tests to use map[string]struct{} for include recursion tracking. |
| pkg/cli/outcome_eval_update.go | Extract shared []any→[]string conversion helper and reuse it. |
| pkg/cli/outcome_eval_review.go | Reuse shared anySliceToStrings conversion helper. |
| pkg/cli/logs_parsing_runner.go | New shared Node.js runner (tempdir + write script + exec node + write output). |
| pkg/cli/logs_parsing_javascript.go | Replace inline Node.js execution with the shared runner and shared @actions/core mock. |
| pkg/cli/logs_parsing_firewall.go | Replace inline Node.js execution with the shared runner and shared @actions/core mock. |
| pkg/cli/logs_artifact_set.go | Switch artifact-name dedup tracking to a struct{} set. |
| pkg/cli/includes.go | Switch import/include dedup tracking fields and maps to struct{} sets. |
| pkg/cli/generate_action_metadata_command.go | Switch inputs/outputs/deps dedup tracking to struct{} sets. |
| pkg/cli/experiments_command.go | Switch workflow-ID dedup tracking to a struct{} set. |
| pkg/cli/drain3_integration.go | Switch anomaly-reason dedup tracking to a struct{} set. |
| pkg/cli/compile_workflow_processor.go | Switch safe-output label dedup tracking to a struct{} set. |
| pkg/cli/codemod_steps_run_secrets_env.go | Switch multiple dedup maps to struct{} sets during codemod rewrite. |
| pkg/cli/codemod_effective_tokens_to_ai_credits.go | Replace local expression detection with stringutil.ContainsExpression. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 38/38 changed files
- Comments generated: 1
| // runNodeScript writes script to a temporary parser.js file, executes it with | ||
| // node, and writes the trimmed stdout to outputPath. The temporary directory | ||
| // is cleaned up automatically. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🏗️ Design Decision Gate — ADR RequiredThis PR makes significant changes to core business logic (310 new lines across 📄 Draft ADR committed:
📋 What to do next
Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision. ❓ Why ADRs Matter
This PR establishes canonical conventions (standardize dedup on 📋 Michael Nygard ADR Format ReferenceAn ADR must contain these four sections to be considered complete:
All ADRs are stored in
|
🧪 Test Quality Sentinel Report✅ Test Quality Score: 100/100 — Excellent
📊 Metrics & Test Classification (2 tests analyzed)
Go: 1 ( Actual change in test file: both modifications change Pre-existing test quality notes (informational):
Verdict
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd, /zoom-out, and /improve-codebase-architecture — requesting changes on test coverage gaps and one undocumented semantic change.
📋 Key Themes & Highlights
Blocking Issues
- Missing tests for hub functions:
ContainsExpression/HasExpressionMarker/IsExpression(new canonical API, 0 tests) andDeduplicateTrimmed(27+ call sites, 0 tests). Each is a load-bearing abstraction; a regression would surface only through distant integration failures. - Undocumented semantic change:
ContainsExpressionis stricter than theisExpressionValueit replaces in the codemod —"${{}}"(empty body) now returnsfalsewhere the old helper returnedtrue. The tightening is arguably more correct, but needs a test and a note in the PR description.
Other Issues (non-blocking)
runNodeScript(new single-point-of-failure for both log parsers) has no tests; the bundled relative-path fix inlogs_parsing_javascript.godeserves a mention in the PR description.splitDomainList: whitespace-only input now returns a non-nil empty slice (vs. oldnil); worth locking in with a test case.mergeSparsePatterns:strings.SplitSeq+ manual collect negates the generator benefit;strings.Splitmakes it a one-liner consistent withmergeFetchRefs.anySliceToStringsis fine where it is; flagged only as a forward-looking note if a third call site appears outsidepkg/cli.
Positive Highlights
- ✅ Excellent PR description with per-category rationale and intentional non-consolidations called out (
ShortenCommand) - ✅
map[string]bool→map[string]struct{}sweep is mechanical and correct across all 27+ sites - ✅
expression_patterns.goprivate helpers now delegate tostringutil(clean layering per ADR-27198) - ✅
mergeFetchRefsandsplitDomainListsimplifications are clean one-liners - ✅
jsCoreMockconst extraction is well-documented and the#nosecannotation is preserved correctly
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer
| return false | ||
| } | ||
| closeIdx := strings.Index(afterOpen, "}}") | ||
| return closeIdx > 0 |
There was a problem hiding this comment.
[/tdd] ContainsExpression is semantically stricter than the isExpressionValue it replaces in codemod_effective_tokens_to_ai_credits.go — no test captures this edge case.
The old local helper returned true for "${{}}" (empty body):
strings.Contains(value, "${{") && strings.Contains(value, "}}")The new ContainsExpression uses closeIdx > 0, so "${{}}" now returns false. The tightening is arguably more correct, but the behavioral change is unmentioned in the PR description and there is no test to document or protect it.
💡 Suggested test
// In pkg/stringutil/stringutil_test.go
func TestContainsExpression(t *testing.T) {
cases := []struct{ in string; want bool }{
{"${{ secrets.FOO }}", true},
{"prefix ${{ x }} suffix", true},
{"${{}}" , false}, // empty body — distinct from old isExpressionValue
{"no expression", false},
{"}}", false}, // no opening marker
{"}} ${{ foo", false}, // closing marker precedes opening
}
for _, c := range cases {
if got := ContainsExpression(c.in); got != c.want {
t.Errorf("ContainsExpression(%q) = %v, want %v", c.in, got, c.want)
}
}
}Also worth documenting the "${{}}" divergence in the PR description so reviewers understand this is an intentional tightening.
|
|
||
| // ContainsExpression reports whether s contains a complete non-empty GitHub Actions expression. | ||
| // A complete expression has a "${{" marker followed by at least one character before "}}". | ||
| func ContainsExpression(s string) bool { |
There was a problem hiding this comment.
[/tdd] HasExpressionMarker, ContainsExpression, and IsExpression are the new canonical expression-detection API replacing three independent implementations, but pkg/stringutil/stringutil_test.go has no tests for any of them. The adjacent Truncate function has thorough table-driven coverage — these should follow the same pattern.
💡 Minimum coverage to add
HasExpressionMarker: partial/unclosed"${{ incomplete"→ true; no marker → falseContainsExpression: complete expression → true; empty body"${{}}"→ false; closing before opening → falseIsExpression: whole-string expression → true; expression embedded mid-string → false; trailing content → false
Without these, regressions in the canonical API would only surface indirectly through downstream integration tests.
| // leading/trailing whitespace. Empty strings (after trimming) are dropped, and | ||
| // duplicate values are removed preserving the first-occurrence order. | ||
| // This is a pure function that does not modify the input slice. | ||
| func DeduplicateTrimmed(slice []string) []string { |
There was a problem hiding this comment.
[/tdd] DeduplicateTrimmed is the hub abstraction powering 27+ refactored dedup sites across three packages, but no unit tests are added in this PR. The existing TestDeduplicate is a good template.
💡 Key cases to cover
func TestDeduplicateTrimmed(t *testing.T) {
cases := []struct {
name string
in []string
want []string
}{
{"nil input", nil, []string{}},
{"empty strings dropped", []string{"", " ", "a"}, []string{"a"}},
{"trimming then dedup", []string{" a", "a ", "a"}, []string{"a"}},
{"order preserved", []string{"b", "a", "b"}, []string{"b", "a"}},
{"no mutation of input", []string{" x "}, /* verify input unchanged */ []string{"x"}},
}
// ...
}The "trimming then dedup" case is especially important: " a" and "a " are distinct strings in the input but collide after trimming — verifying they produce a single output "a" ensures the trim-before-dedup semantics are correct.
| // runNodeScript writes script to a temporary parser.js file, executes it with | ||
| // node, and writes the trimmed stdout to outputPath. The temporary directory | ||
| // is cleaned up automatically. | ||
| func runNodeScript(script, outputPath string) error { |
There was a problem hiding this comment.
[/tdd] runNodeScript is now a single point of failure for both parseFirewallLogs and parseAgentLog, but the new file has no tests. The PR also bundles a silent correctness fix: logs_parsing_javascript.go previously invoked exec.Command("node", "parser.js") with a relative path; the shared runNodeScript correctly uses the absolute nodeFile path.
💡 Test suggestions
At minimum, test the error path when node is not on PATH or the script fails:
func TestRunNodeScript_ErrorPropagation(t *testing.T) {
err := runNodeScript("process.exit(1);", t.TempDir()+"/out.md")
if err == nil {
t.Fatal("expected error for failing script")
}
if !strings.Contains(err.Error(), "failed to execute node script") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestRunNodeScript_WritesOutput(t *testing.T) {
out := filepath.Join(t.TempDir(), "result.md")
err := runNodeScript(`console.log("hello");`, out)
require.NoError(t, err)
content, _ := os.ReadFile(out)
assert.Equal(t, "hello", string(content))
}The relative-path fix should also be mentioned explicitly in the PR description.
| return nil | ||
| } | ||
| return result | ||
| return sliceutil.DeduplicateTrimmed(strings.Split(domains, ",")) |
There was a problem hiding this comment.
[/zoom-out] Whitespace-only input (e.g. " ") now returns a non-nil empty slice instead of nil. The early-return guard only checks for "", so strings.Split(" ", ",") produces [" "], which DeduplicateTrimmed trims and drops, returning make([]string, 0, 1) (non-nil empty).
The old inline loop returned nil for whitespace-only input because nothing was appended. Callers that distinguish nil from []string{} (e.g. for JSON null vs [] serialization) may be affected. The existing splitDomainList tests at awf_config_test.go:1076 should add a whitespace-only case to lock in the intended behavior.
💡 One-line fix if nil preservation is needed
func splitDomainList(domains string) []string {
result := sliceutil.DeduplicateTrimmed(strings.Split(domains, ","))
if len(result) == 0 {
return nil
}
return result
}Or add a test confirming the empty-slice return is intentional.
| } | ||
| } | ||
|
|
||
| var extra []string |
There was a problem hiding this comment.
[/improve-codebase-architecture] strings.SplitSeq is used as a generator to avoid allocating a []string from strings.Split, but then immediately collected into extra []string, negating the benefit. mergeFetchRefs right below this function shows the idiomatic one-liner form — mergeSparsePatterns can match it:
func mergeSparsePatterns(existing []string, newPatterns string) []string {
return sliceutil.DeduplicateTrimmed(append(existing, strings.Split(newPatterns, "\n")...))
}strings.Split on an empty string returns [""], which DeduplicateTrimmed trims and drops, so the empty-newPatterns case is handled correctly without a guard.
|
|
||
| // anySliceToStrings Sprint-converts each element of values to a string, | ||
| // trims whitespace, and drops empty results. | ||
| func anySliceToStrings(values []any) []string { |
There was a problem hiding this comment.
[/improve-codebase-architecture] anySliceToStrings converts []any → []string with trim-and-drop-empty semantics identical to DeduplicateTrimmed's element-processing step (minus the dedup). It's defined here as a package-private helper used by two sibling files (outcome_eval_review.go and outcome_eval_update.go), which is fine for now.
If a third call site appears outside pkg/cli, it will need re-extraction. Consider promoting it to pkg/sliceutil (e.g. AnySliceToStrings) to keep conversion helpers co-located with DeduplicateTrimmed. Not a blocker, but worth tracking.
There was a problem hiding this comment.
REQUEST_CHANGES — one structural correctness bug and one semantic regression need fixes before merge.
High: runNodeScript creates a fresh temp directory and executes the node process from there, but parseAgentLog writes log_parser_bootstrap.cjs / log_parser_shared.cjs into a different outer tempDir. When those helpers are non-stub, any require('./...') in the parser script will hit MODULE_NOT_FOUND because node's working directory is the inner dir. The PR introduced this split by extracting the helper without threading the prepared workspace through — the abstraction boundary is wrong.
Medium: ContainsExpression requires closeIdx > 0 (at least one character between the markers), whereas the removed isExpressionValue in codemod_effective_tokens_to_ai_credits.go used plain strings.Contains, which accepted ${{}}. On a codemod path this silent tightening can rewrite values that should be left as-is.
Low (non-blocking): append(existing, extra...) in mergeSparsePatterns / mergeFetchRefs may clobber the backing array of existing if it has excess capacity — a classic Go aliased-slice footgun. See inline comment for a one-line clone fix.
🔎 Code quality review by PR Code Quality Reviewer
| } | ||
|
|
||
| return nil | ||
| return runNodeScript(nodeScript, logMdPath) |
There was a problem hiding this comment.
Two-tempDir split will break require() for helper files when scripts are non-stub: parseAgentLog writes log_parser_bootstrap.cjs and log_parser_shared.cjs into its own tempDir, but runNodeScript creates a different temp directory and executes parser.js from there. Any relative require call inside the parser script (e.g. require('./log_parser_shared.cjs')) will fail with MODULE_NOT_FOUND at runtime, because the working directory used by node is the inner tempDir, not the outer one that holds the helper files.
💡 Suggested fix
The root cause is that runNodeScript unconditionally creates its own isolated workspace. The cleanest fix is to accept an optional working directory instead of always creating a fresh one:
// runNodeScript writes parser.js into workDir (which the caller prepared)
// and executes it there. If workDir is empty, a fresh tempDir is created.
func runNodeScript(script, outputPath, workDir string) error {
if workDir == "" {
var err error
workDir, err = os.MkdirTemp("", "node_parser")
if err != nil {
return fmt.Errorf("failed to create temp dir: %w", err)
}
defer os.RemoveAll(workDir)
}
nodeFile := filepath.Join(workDir, "parser.js")
...
}Then parseAgentLog passes its own tempDir as workDir so bootstrap and shared files are co-located with parser.js.
| return false | ||
| } | ||
| closeIdx := strings.Index(afterOpen, "}}") | ||
| return closeIdx > 0 |
There was a problem hiding this comment.
ContainsExpression silently tightens semantics vs the removed isExpressionValue: The old helper checked strings.Contains(value, "${{") AND strings.Contains(value, "}}"), which accepted empty expressions like ${{}} (closeIdx == 0). The new implementation requires closeIdx > 0, so ${{}} now returns false — the value would be treated as a plain literal and potentially rewritten by the codemod instead of being left untouched.
💡 Details and suggested fix
In normalizeLegacyBudgetValue, the guard stringutil.ContainsExpression(trimmed) is supposed to detect any dynamic/expression value and bail early. If ${{}} is an invalid-but-dynamic sentinel in a legacy config, this regression would silently corrupt it instead of preserving it.
If the intent is to match the previous semantics exactly (any string containing both markers, including empty expressions), change to:
func ContainsExpression(s string) bool {
return strings.Contains(s, "${{") && strings.Contains(s, "}}")
}If the stricter check (at least one character between the markers) is intentional for the stringutil canonical version, document it explicitly and verify every call site that was migrated from a looser implementation.
| } | ||
|
|
||
| return result | ||
| return sliceutil.DeduplicateTrimmed(append(existing, extra...)) |
There was a problem hiding this comment.
append(existing, extra...) may silently clobber memory shared with an aliased slice: if existing was created by reslicing a larger backing array (i.e. cap(existing) > len(existing)), Go's append will write the extra elements directly into that backing array beyond len(existing). Any other slice that shares the same array and covers those positions will see its data silently overwritten. The same issue is on line 426 (mergeFetchRefs).
💡 Suggested fix
Clone existing before appending so the caller's backing array is never touched:
func mergeSparsePatterns(existing []string, newPatterns string) []string {
combined := make([]string, len(existing), len(existing)+strings.Count(newPatterns, "\n")+1)
copy(combined, existing)
for p := range strings.SplitSeq(newPatterns, "\n") {
combined = append(combined, p)
}
return sliceutil.DeduplicateTrimmed(combined)
}Or more concisely, pre-clone before delegating:
return sliceutil.DeduplicateTrimmed(append(slices.Clone(existing), extra...))This is a low-probability footgun today but makes the ownership contract explicit.
|
@copilot run pr-finisher skill |
…dOutput) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Semantic analysis identified 7 categories of duplicate/scattered helpers across ~920 non-test Go files. This PR eliminates them by consolidating into canonical locations.
Expression detection (3 → 1)
Three independent
${{ }}detectors inworkflow/expression_patterns.go,workflow/publish_assets.go, andcli/codemod_effective_tokens_to_ai_credits.go. Added exportedHasExpressionMarker,ContainsExpression,IsExpressiontopkg/stringutil; theexpression_patterns.goprivate helpers now delegate (per ADR-27198); the other two sites removed entirely.Reuse gaps
config_helpers.go:extractStringFromMap→ delegates totypeutil.LookupString(8 lines → 5)workflow/strings.go:ShortenCommandintentionally left distinct (content-cap semantics differ fromstringutil.Truncate)Dedup loops (27+ sites)
Added
sliceutil.DeduplicateTrimmed([]string) []string— trims, drops empties, deduplicates preserving order. Swept allseen := make(map[string]bool)dedup patterns tomap[string]struct{}acrosspkg/workflow,pkg/cli, andpkg/parser, including cascading changes tofrontmatterImportsOpts.seenstruct field andcollectLocalIncludeDependenciesRecursivesignature.splitDomainListand theschema_errors.gowantTypes dedup now usesliceutilhelpers.mergeSparsePatterns/mergeFetchRefsincheckout_manager.go(near-identical bodies) both replaced withsliceutil.DeduplicateTrimmedcalls.Node.js parser runner (pkg/cli)
logs_parsing_firewall.goandlogs_parsing_javascript.goshared ~85% of their Node.js scaffolding. Extracted into newlogs_parsing_runner.go:any→string converters (pkg/cli)
metadataStringSlice(outcome_eval_review) andmutableStringSlice(outcome_eval_update) both had identical[]any→[]stringconversion loops. Extracted asanySliceToStrings([]any) []string; both delegates call it.