Skip to content
42 changes: 34 additions & 8 deletions .github/workflows/copilot-pr-nlp-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,16 +206,42 @@ For each generated chart:
find /tmp/gh-aw/python/charts/ -maxdepth 1 -ls
```

2. **Upload each chart** using the `upload asset` tool
3. **Collect returned URLs** for embedding in the discussion
2. **Upload each chart** using the `upload asset` MCP tool (call it directly — do NOT wrap in a shell command or use `$()` to capture the URL)

3. **Record the returned URL** from each upload by writing it to a plain text file in `/tmp/gh-aw/agent/` immediately after the MCP tool returns:
- `sentiment_distribution.png` → write URL to `/tmp/gh-aw/agent/url-sentiment-distribution.txt`
- `sentiment_timeline.png` → write URL to `/tmp/gh-aw/agent/url-sentiment-timeline.txt`
- `topic_frequencies.png` → write URL to `/tmp/gh-aw/agent/url-topic-frequencies.txt`
- `topics_wordcloud.png` → write URL to `/tmp/gh-aw/agent/url-topics-wordcloud.txt`
- `keyword_trends.png` → write URL to `/tmp/gh-aw/agent/url-keyword-trends.txt`

For example, after the `upload asset` tool returns `https://github.com/.../chart.png`, write it with:
```bash
echo -n "https://github.com/.../chart.png" > /tmp/gh-aw/agent/url-sentiment-distribution.txt
```

**Do NOT** store URLs in shell variables or use command substitution (`$(...)`) — this triggers the security harness.

### Phase 6: Create Analysis Discussion

Build the discussion body by reading the URL files saved in Phase 5, then post a comprehensive discussion.

**Before constructing the body**, read the uploaded chart URLs:
```bash
SENTIMENT_DIST_URL=$(cat /tmp/gh-aw/agent/url-sentiment-distribution.txt 2>/dev/null || echo "")
SENTIMENT_TIME_URL=$(cat /tmp/gh-aw/agent/url-sentiment-timeline.txt 2>/dev/null || echo "")
TOPIC_FREQ_URL=$(cat /tmp/gh-aw/agent/url-topic-frequencies.txt 2>/dev/null || echo "")
TOPICS_CLOUD_URL=$(cat /tmp/gh-aw/agent/url-topics-wordcloud.txt 2>/dev/null || echo "")
KEYWORD_TRENDS_URL=$(cat /tmp/gh-aw/agent/url-keyword-trends.txt 2>/dev/null || echo "")
```
Comment on lines +209 to +246

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 5 explicitly says not to store URLs in shell variables or use command substitution ($(...)), but Phase 6 immediately uses SENTIMENT_DIST_URL=$(cat ...) (both a shell variable and command substitution). If the intent is "don’t do this inside safe-outputs.steps run scripts" but it’s OK in the agent bash turn, please clarify the wording in Phase 5 (or adjust Phase 6 to avoid $()/shell vars by reading the files directly in Python) to remove the internal contradiction.

See below for a potential fix:

**Before constructing the body**, use Python to read the uploaded chart URLs directly from the files saved in `/tmp/gh-aw/agent/`. Do not use shell variables or command substitution here either; have Python read each file and treat missing files as empty strings.

Use a Python script to read:
- `/tmp/gh-aw/agent/url-sentiment-distribution.txt`
- `/tmp/gh-aw/agent/url-sentiment-timeline.txt`
- `/tmp/gh-aw/agent/url-topic-frequencies.txt`
- `/tmp/gh-aw/agent/url-topics-wordcloud.txt`
- `/tmp/gh-aw/agent/url-keyword-trends.txt`

Then write the fully-substituted discussion body to `/tmp/gh-aw/agent/discussion_body.md`, inserting the literal URL strings directly into the markdown body. After that, pass the body to the `create_discussion` safe-output tool.

Post a comprehensive discussion with the following structure:

**Title**: `Copilot PR Conversation NLP Analysis - [DATE]`

**Content Template** (substitute `[SENTIMENT_DIST_URL]`, `[SENTIMENT_TIME_URL]`, `[TOPIC_FREQ_URL]`, `[TOPICS_CLOUD_URL]`, and `[KEYWORD_TRENDS_URL]` with the literal URL strings read by Python from the files above):

Copilot uses AI. Check for mistakes.

Use a Python script to write the fully-substituted discussion body to `/tmp/gh-aw/agent/discussion_body.md`, inserting the literal URL strings directly (no shell variable expansion in the final body). Then pass the body to the `create_discussion` safe-output tool.

Post a comprehensive discussion with the following structure:

**Title**: `Copilot PR Conversation NLP Analysis - [DATE]`

**Content Template**:
**Content Template** (substitute `[SENTIMENT_DIST_URL]`, `[SENTIMENT_TIME_URL]`, `[TOPIC_FREQ_URL]`, `[TOPICS_CLOUD_URL]`, and `[KEYWORD_TRENDS_URL]` with the literal URL strings read from the files above):
````markdown
# 🤖 Copilot PR Conversation NLP Analysis - [DATE]

Expand All @@ -230,7 +256,7 @@ Post a comprehensive discussion with the following structure:
## Sentiment Analysis

### Overall Sentiment Distribution
![Sentiment Distribution](URL_FROM_UPLOAD_ASSET_sentiment_distribution)
![Sentiment Distribution]([SENTIMENT_DIST_URL])

**Key Findings**:
- **Positive messages**: [count] ([percentage]%)
Expand All @@ -239,7 +265,7 @@ Post a comprehensive discussion with the following structure:
- **Average polarity**: [score] on scale of -1 (very negative) to +1 (very positive)

### Sentiment Over Conversation Timeline
![Sentiment Timeline](URL_FROM_UPLOAD_ASSET_sentiment_timeline)
![Sentiment Timeline]([SENTIMENT_TIME_URL])

**Observations**:
- [e.g., "Conversations typically start neutral and become more positive as issues are resolved"]
Expand All @@ -248,7 +274,7 @@ Post a comprehensive discussion with the following structure:
## Topic Analysis

### Identified Discussion Topics
![Topic Frequencies](URL_FROM_UPLOAD_ASSET_topic_frequencies)
![Topic Frequencies]([TOPIC_FREQ_URL])

**Major Topics Detected**:
1. **[Topic 1 Name]** ([count] messages, [percentage]%): [brief description]
Expand All @@ -257,12 +283,12 @@ Post a comprehensive discussion with the following structure:
4. **[Topic 4 Name]** ([count] messages, [percentage]%): [brief description]

### Topic Word Cloud
![Topics Word Cloud](URL_FROM_UPLOAD_ASSET_topics_wordcloud)
![Topics Word Cloud]([TOPICS_CLOUD_URL])

## Keyword Trends

### Most Common Keywords and Phrases
![Keyword Trends](URL_FROM_UPLOAD_ASSET_keyword_trends)
![Keyword Trends]([KEYWORD_TRENDS_URL])

**Top Recurring Terms**:
- **Technical**: [list top 5 technical terms]
Expand Down
6 changes: 6 additions & 0 deletions pkg/workflow/compiler_validators.go
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,12 @@ func (c *Compiler) validateToolConfiguration(workflowData *WorkflowData, markdow
return formatCompilerError(markdownPath, "error", err.Error(), err)
}

// Validate safe-outputs steps for dangerous shell expansion patterns
log.Printf("Validating safe-outputs steps for shell expansion patterns")
if err := validateSafeOutputsStepsShellExpansion(workflowData.SafeOutputs); err != nil {
return formatCompilerError(markdownPath, "error", err.Error(), err)
}

// Validate safe-outputs allowed-domains configuration
log.Printf("Validating safe-outputs allowed-domains")
if err := c.validateSafeOutputsAllowedDomains(workflowData.SafeOutputs); err != nil {
Expand Down
194 changes: 194 additions & 0 deletions pkg/workflow/safe_outputs_steps_shell_expansion_validation.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
// This file validates safe-outputs.steps run scripts for dangerous shell expansion
// patterns that would be blocked at runtime by the safe-outputs security harness.
//
// # Shell Expansion Security in Safe-Outputs Steps
//
// The safe-outputs security harness blocks shell scripts that contain dangerous
// bash expansion patterns. This validator detects those patterns at compile time
// so workflow authors receive a clear error during `gh aw compile` rather than
// a confusing runtime failure.
//
// # Blocked Patterns
//
// The following bash constructs are rejected:
// - ${var@operator}: bash 4.4+ parameter transformation (e.g. ${foo@P}, ${bar@U})
// - ${!var}: bash indirect expansion
// - $(command): command substitution
// - `command`: backtick command substitution
//
// GitHub Actions template expressions (${{ ... }}) are explicitly allowed and are
// excluded from the checks.
//
// # When to Add Validation Here
//
// Add validation to this file when:
// - A new dangerous shell expansion variant must be detected in safe-outputs run scripts
// - The safe-outputs harness blocks a new class of shell pattern at runtime
//
// For general safe-outputs validation, see safe_outputs_validation.go.

package workflow

import (
"fmt"
"regexp"
"strings"
)

var safeOutputsStepsShellExpansionLog = newValidationLogger("safe_outputs_steps_shell_expansion")

// shellExpansionPattern matches dangerous bash expansion constructs inside a run: script.
//
// Captured groups by name:
// - "paramTransform": ${var@operator} — bash 4.4+ parameter transformation
// - "indirectExpand": ${!var} — bash indirect expansion
// - "commandSubst": $(...) — command substitution (any $( sequence)
// - "backtick": `...` — backtick command substitution
//
// After matching, callers must exclude false-positives:
// - "commandSubst" matches starting with "$((" (arithmetic expansion) are ignored.
// - "commandSubst" matches starting with "$({" or "$({{" that form "${{" are ignored
// because "${{" is not a valid shell construct (GitHub Actions uses "${{ }}").

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment claims that command substitution matches starting with "$({" or "$({{" should be ignored because of GitHub Actions "${{ }}" expressions, but the implementation only skips "$((" arithmetic expansion and does not (and probably should not) exempt "$({". Please update the comment (or implement the described exemption if it's truly required) so it matches the actual behavior and GitHub Actions expression syntax ("${{" not "$({{").

Suggested change
// - "commandSubst" matches starting with "$({" or "$({{" that form "${{" are ignored
// because "${{" is not a valid shell construct (GitHub Actions uses "${{ }}").
// - GitHub Actions expressions use "${{ ... }}", not "$(...)", so they do not
// match the "commandSubst" pattern.

Copilot uses AI. Check for mistakes.
//
// Note: Go's regexp/RE2 does not support lookaheads, so post-match filtering is used
// instead of inline negative lookaheads.
var shellExpansionPattern = regexp.MustCompile(
// Parameter transformation: ${var@op} — the @ must follow word characters
`(?P<paramTransform>\$\{[A-Za-z_][A-Za-z0-9_]*@[^}]*\})` +
`|` +
// Indirect expansion: ${!var} — '!' immediately after '{'
`(?P<indirectExpand>\$\{![A-Za-z_])` +
`|` +
// Command substitution: any $( sequence
`(?P<commandSubst>\$\()` +
`|` +
// Backtick command substitution: `...`
"(?P<backtick>`[^`\n]+`)",
)

// dangerousPatternDescription maps a named capture group to a human-readable
// description used in compiler error messages.
var dangerousPatternDescription = map[string]string{
"paramTransform": "parameter transformation (e.g. ${var@P})",
"indirectExpand": "indirect expansion (e.g. ${!var})",
"commandSubst": "command substitution (e.g. $(command))",
"backtick": "backtick command substitution (e.g. `command`)",
}

// validateSafeOutputsStepsShellExpansion checks every run: script in the
// safe-outputs.steps list for dangerous bash expansion patterns that would be
// blocked by the safe-outputs security harness at runtime.
//
// Returning a non-nil error causes compilation to fail with a descriptive message
// that includes the step index, the offending snippet, and the pattern category.
func validateSafeOutputsStepsShellExpansion(config *SafeOutputsConfig) error {
if config == nil || len(config.Steps) == 0 {
return nil
}

safeOutputsStepsShellExpansionLog.Printf("Validating %d safe-outputs steps for dangerous shell expansion patterns", len(config.Steps))

for i, step := range config.Steps {
stepMap, ok := step.(map[string]any)
if !ok {
continue
}
runVal, exists := stepMap["run"]
if !exists {
continue
}
runScript, ok := runVal.(string)
if !ok {
continue
}

if err := validateRunScriptForShellExpansion(i, runScript); err != nil {
return err
}
}

safeOutputsStepsShellExpansionLog.Print("Safe-outputs steps shell expansion validation passed")
return nil
}

// validateRunScriptForShellExpansion checks a single run: script for dangerous
// bash expansion patterns. stepIndex is 0-based and is included in error messages.
func validateRunScriptForShellExpansion(stepIndex int, script string) error {
// Fast path: no '$' or backtick character means no expansion pattern can be present.
if !strings.ContainsAny(script, "$`") {
return nil
}

// Scan all matches so we can skip false positives (e.g. arithmetic $((, GitHub
// Actions ${{ expressions) before reporting the first true violation.
groupNames := shellExpansionPattern.SubexpNames()
allMatches := shellExpansionPattern.FindAllStringSubmatchIndex(script, -1)

for _, match := range allMatches {
snippet := ""
patternDescription := "dangerous shell expansion"
groupName := ""
matchStart := match[0]

for gi, name := range groupNames {
if name == "" {
continue
}
start, end := match[gi*2], match[gi*2+1]
if start < 0 {
continue // group did not participate in this match
}
if _, known := dangerousPatternDescription[name]; known {
raw := script[start:end]
groupName = name
// For short patterns like $( or ${! that don't capture the full construct,
// extend the snippet to include context up to the end of the line or 60 chars.
if len(raw) < 10 {
lineEnd := strings.IndexByte(script[start:], '\n')
if lineEnd < 0 {
lineEnd = len(script) - start
}
raw = script[start : start+lineEnd]
// Remove any trailing control characters.
raw = strings.TrimRight(raw, "\r\t ")
}
// Clip the snippet to 60 characters to keep the error readable.
if len(raw) > 60 {
raw = raw[:57] + "..."
}
snippet = raw
patternDescription = dangerousPatternDescription[name]
break
}
}

if groupName == "" {
continue
}

// Skip arithmetic expansion $((: it uses $(( not $(command).
if groupName == "commandSubst" {
// Check the two characters after $( to detect $((
if matchStart+3 <= len(script) && script[matchStart:matchStart+3] == "$((" {
continue
}
}

safeOutputsStepsShellExpansionLog.Printf("Dangerous pattern found in safe-outputs step %d: %s", stepIndex, patternDescription)

return fmt.Errorf(
"safe-outputs.steps[%d]: run script contains %s, which is blocked by the "+
"safe-outputs security harness at runtime\n\n"+
" Offending snippet: %s\n\n"+
"Avoid shell variable expansion in safe-outputs run scripts. "+
"Write URL values and other dynamic content to files in /tmp/gh-aw/agent/ "+
"during the agent turn, then read the file contents in the safe-outputs step "+
"without using shell expansion (e.g. with 'cat' or a script argument)",

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remediation text says "Avoid shell variable expansion in safe-outputs run scripts", but this validator explicitly allows simple "$VAR" and "${VAR}" expansions (and tests cover those as allowed). This wording is misleading—please narrow it to the specific blocked constructs (command substitution/backticks/indirect expansion/parameter transformation) or describe the safe alternative without implying all variable expansion is forbidden.

Suggested change
"Avoid shell variable expansion in safe-outputs run scripts. "+
"Write URL values and other dynamic content to files in /tmp/gh-aw/agent/ "+
"during the agent turn, then read the file contents in the safe-outputs step "+
"without using shell expansion (e.g. with 'cat' or a script argument)",
"Avoid command substitution, backticks, indirect expansion, and parameter "+
"transformation in safe-outputs run scripts. Write URL values and other "+
"dynamic content to files in /tmp/gh-aw/agent/ during the agent turn, then "+
"read the file contents in the safe-outputs step (e.g. with 'cat' or by "+
"passing a script argument)",

Copilot uses AI. Check for mistakes.
stepIndex,
patternDescription,
snippet,
)
}

return nil
}
Loading
Loading