Overview
This is a semantic function-clustering analysis of all non-test .go files under pkg/ (913 source files, ~5,432 function definitions), focused on pkg/workflow/ (400 files) and pkg/cli/ (319 files).
Headline: the codebase is unusually well-modularized. No actionable copy-paste duplication was found in pkg/workflow/ or pkg/cli/. The package is strongly concern-partitioned (*_validation.go holds validators, *_parser.go holds parsers, etc.). The findings below are a small number of low-risk cohesion improvements — not bugs, not duplication.
Summary
- Source files analyzed: 913 (non-test, under
pkg/)
- Function definitions parsed: ~5,432 (4,234 plain functions, 1,198 methods)
- Actionable duplicate implementations: 0
- Scattered-helper false positives explained: build-tag twins, delegating shims, linter
testdata/ fixtures
- Concrete cohesion opportunities: 3 files + 1 naming collision
- Status: ✅ healthy organization
Findings
1. Outlier functions (mild file-cohesion opportunities)
Three files mix a secondary concern with their primary purpose. None are urgent.
a. pkg/workflow/js.go — JS lexer embedded among asset getters
The file is otherwise dedicated to Get*Script() asset accessors, but lines 71–391 contain a full JavaScript comment-stripper / tokenizer:
removeJavaScriptComments (js.go:71), removeJavaScriptCommentsFromLine (js.go:100)
isInsideStringLiteral (js.go:210), isInsideRegexLiteral (js.go:256)
canStartRegexLiteral (js.go:322), canStartRegexLiteralAt (js.go:327)
extractWordBefore (js.go:366), isLetter (js.go:386), isDigit (js.go:391)
Recommendation: extract the lexer helpers into js_lexer.go (or js_minify.go), leaving js.go for the script getters. Improved discoverability; behavior unchanged.
b. pkg/workflow/model_identifier.go — validators in a noun-named file
6 of 14 functions are validate*, in a file named for an identifier type. Sibling model_alias_validation.go already follows the *_validation.go convention:
validateProviderToken (model_identifier.go:231), validateModelToken (:251), validateModelGlobToken (:265), validateBareName (:283), validateParamKey (:327), validateParamValue (:343)
Recommendation: consider moving the validate* group into a model_validation.go (or merge with existing model validation), keeping parsing in model_identifier.go.
c. pkg/workflow/compiler_orchestrator_workflow.go — least cohesive by verb-spread
Mixes validateWorkflow* (4), mergeImported*/mergeWorkflow* (5), plus extract*/build*/parse*/format*Error. Expected for an orchestrator, but the validateWorkflowBuildContext / ...ModelAliasMap / ...EngineSettings / ...ToolConfigurations cluster reads like it could live beside the other validators. Lowest priority.
2. Naming collision (readability/navigation hazard)
extractToolsFromFrontmatter is defined in two packages with different signatures and contracts (so NOT a duplicate — just a grep/navigation hazard):
pkg/parser/content_extractor.go:60 — func(map[string]any) (string, error) (returns merged tools+mcp-servers as a JSON string)
pkg/workflow/frontmatter_extraction_metadata.go:304 — func(map[string]any) map[string]any (returns only the tools field)
Recommendation: rename one (e.g. the parser copy to extractToolsAndMCPAsJSON) to disambiguate.
Why the "scattered helper" signal was a false positive
Repeated function names explained (all benign)
Within pkg/cli/, every plain function name is unique (zero repeats). Within pkg/workflow/, the only name in 3+ files is init(). The apparent cross-file repeats elsewhere resolve to intentional patterns:
_wasm.go build-tag twins — same name, real impl + no-op stub, mutually exclusive via //go:build (e.g. RunGH in github_cli.go vs github_cli_wasm.go; validatePipPackages/validateUvPackages real vs stub returning nil). Must NOT be merged.
- Delegating shims (already centralized — good) —
LevenshteinDistance (pkg/parser/schema_suggestions.go:245 is a 1-line delegate to stringutil.LevenshteinDistance); SanitizeName (pkg/workflow/strings.go:98 delegates to pkg/stringutil/sanitize.go:69, with SanitizeOptions a type alias).
- Linter
testdata/ fixtures — good/bad/suppressed/doWork repeated across pkg/linters/*/testdata/src/... are deliberate test fixtures.
Verb-prefix clusters in pkg/workflow (well-organized)
Dominant clusters over ~2,471 workflow functions: get (296), build (204), validate (201), parse (200), extract (181), generate (170). These six account for over half of all functions and map cleanly onto the compile pipeline. Accessor family (is 113, has 54, set 39) and pipeline family are the two coherent super-groups. Files like parser.go, tools_parser.go, trigger_parser.go, and compiler_validators.go are cleanly single-verb — no reorganization needed.
Recommendations (all optional, low-risk)
- (Priority 1) Extract JS lexer helpers from
js.go into js_lexer.go — clearest cohesion win, mechanical move.
- (Priority 2) Rename one
extractToolsFromFrontmatter to remove the cross-package name collision.
- (Priority 3) Optionally relocate the
validate* group out of model_identifier.go.
No consolidation, generics, or duplicate-merge work is warranted — the codebase already centralizes shared logic via shim delegation to pkg/stringutil, pkg/fileutil, etc.
Analysis Metadata
- Detection method: function-name frequency analysis + verb-prefix clustering + body comparison of repeated names
- Source files analyzed: 913 (non-test,
pkg/)
- Function definitions parsed: ~5,432
- Actionable duplicates: 0
- Cohesion opportunities: 3 files + 1 naming collision
- Analysis date: 2026-06-13
References: run 27450245053
Generated by 🔧 Semantic Function Refactoring · 352.4 AIC · ⌖ 15 AIC · ⊞ 9.8K · ◷
Overview
This is a semantic function-clustering analysis of all non-test
.gofiles underpkg/(913 source files, ~5,432 function definitions), focused onpkg/workflow/(400 files) andpkg/cli/(319 files).Headline: the codebase is unusually well-modularized. No actionable copy-paste duplication was found in
pkg/workflow/orpkg/cli/. The package is strongly concern-partitioned (*_validation.goholds validators,*_parser.goholds parsers, etc.). The findings below are a small number of low-risk cohesion improvements — not bugs, not duplication.Summary
pkg/)testdata/fixturesFindings
1. Outlier functions (mild file-cohesion opportunities)
Three files mix a secondary concern with their primary purpose. None are urgent.
a.
pkg/workflow/js.go— JS lexer embedded among asset gettersThe file is otherwise dedicated to
Get*Script()asset accessors, but lines 71–391 contain a full JavaScript comment-stripper / tokenizer:removeJavaScriptComments(js.go:71),removeJavaScriptCommentsFromLine(js.go:100)isInsideStringLiteral(js.go:210),isInsideRegexLiteral(js.go:256)canStartRegexLiteral(js.go:322),canStartRegexLiteralAt(js.go:327)extractWordBefore(js.go:366),isLetter(js.go:386),isDigit(js.go:391)Recommendation: extract the lexer helpers into
js_lexer.go(orjs_minify.go), leavingjs.gofor the script getters. Improved discoverability; behavior unchanged.b.
pkg/workflow/model_identifier.go— validators in a noun-named file6 of 14 functions are
validate*, in a file named for an identifier type. Siblingmodel_alias_validation.goalready follows the*_validation.goconvention:validateProviderToken(model_identifier.go:231),validateModelToken(:251),validateModelGlobToken(:265),validateBareName(:283),validateParamKey(:327),validateParamValue(:343)Recommendation: consider moving the
validate*group into amodel_validation.go(or merge with existing model validation), keeping parsing inmodel_identifier.go.c.
pkg/workflow/compiler_orchestrator_workflow.go— least cohesive by verb-spreadMixes
validateWorkflow*(4),mergeImported*/mergeWorkflow*(5), plusextract*/build*/parse*/format*Error. Expected for an orchestrator, but thevalidateWorkflowBuildContext/...ModelAliasMap/...EngineSettings/...ToolConfigurationscluster reads like it could live beside the other validators. Lowest priority.2. Naming collision (readability/navigation hazard)
extractToolsFromFrontmatteris defined in two packages with different signatures and contracts (so NOT a duplicate — just a grep/navigation hazard):pkg/parser/content_extractor.go:60—func(map[string]any) (string, error)(returns merged tools+mcp-servers as a JSON string)pkg/workflow/frontmatter_extraction_metadata.go:304—func(map[string]any) map[string]any(returns only thetoolsfield)Recommendation: rename one (e.g. the parser copy to
extractToolsAndMCPAsJSON) to disambiguate.Why the "scattered helper" signal was a false positive
Repeated function names explained (all benign)
Within
pkg/cli/, every plain function name is unique (zero repeats). Withinpkg/workflow/, the only name in 3+ files isinit(). The apparent cross-file repeats elsewhere resolve to intentional patterns:_wasm.gobuild-tag twins — same name, real impl + no-op stub, mutually exclusive via//go:build(e.g.RunGHingithub_cli.govsgithub_cli_wasm.go;validatePipPackages/validateUvPackagesreal vs stub returningnil). Must NOT be merged.LevenshteinDistance(pkg/parser/schema_suggestions.go:245is a 1-line delegate tostringutil.LevenshteinDistance);SanitizeName(pkg/workflow/strings.go:98delegates topkg/stringutil/sanitize.go:69, withSanitizeOptionsa type alias).testdata/fixtures —good/bad/suppressed/doWorkrepeated acrosspkg/linters/*/testdata/src/...are deliberate test fixtures.Verb-prefix clusters in pkg/workflow (well-organized)
Dominant clusters over ~2,471 workflow functions:
get(296),build(204),validate(201),parse(200),extract(181),generate(170). These six account for over half of all functions and map cleanly onto the compile pipeline. Accessor family (is113,has54,set39) and pipeline family are the two coherent super-groups. Files likeparser.go,tools_parser.go,trigger_parser.go, andcompiler_validators.goare cleanly single-verb — no reorganization needed.Recommendations (all optional, low-risk)
js.gointojs_lexer.go— clearest cohesion win, mechanical move.extractToolsFromFrontmatterto remove the cross-package name collision.validate*group out ofmodel_identifier.go.No consolidation, generics, or duplicate-merge work is warranted — the codebase already centralizes shared logic via shim delegation to
pkg/stringutil,pkg/fileutil, etc.Analysis Metadata
pkg/)References: run 27450245053