Alerting Framework

This document covers the SOC Lab Docker alerting system: rule format, notifier configuration, notification backends, and incident response playbooks for each built-in detection rule.

Overview

The alerting framework consists of three components:

Component	Location	Purpose
Alert rules	`alerts/rules/*.yml`	Define what to detect and how to notify
Notifier script	`scripts/alert_notifier.sh`	Polls ES, evaluates rules, sends notifications
Playbooks	This document (below)	Step-by-step incident response guidance

The notifier polls Elasticsearch at a configurable interval, runs a count query for each rule, and fires notifications when the threshold is exceeded.

Prerequisites

The notifier script requires curl and jq. Both are standard on Linux/macOS. On Windows, neither is guaranteed to be on your PATH even if installed.

Checking what you have:

command -v curl && curl --version | head -1
command -v jq   && jq --version

If jq is missing or not found:

Common locations on Windows:

Installation method	Likely path
Anaconda / Miniconda	`C:\Users\<you>\anaconda3\Library\mingw-w64\bin\jq.exe`
Scoop	`C:\Users\<you>\scoop\shims\jq.exe`
Chocolatey	`C:\ProgramData\chocolatey\bin\jq.exe`
Git for Windows	`C:\Program Files\Git\usr\bin\jq.exe`
Manual download	Wherever you saved it from jqlang.org

Once you find it, add the directory to your bash PATH for the session:

export PATH="/c/Users/<you>/anaconda3/Library/mingw-w64/bin:$PATH"
bash scripts/alert_notifier.sh --dry-run --once

To make this permanent, add the export PATH=... line to your ~/.bashrc or ~/.bash_profile so it applies to every new terminal session.

Troubleshooting tip: If the script exits immediately with [WARN] Missing required tools: jq, that is always a PATH issue — jq exists on the machine but the shell can't find it. Use find / -name "jq.exe" 2>/dev/null (or where jq in Command Prompt) to locate it, then add that directory to PATH.

Quick Start

1. Configure credentials

Copy and populate .env:

cp .env.example .env

Set the following values:

# Elasticsearch connection
ES_HOST=http://localhost:9200

# Notification backends (set at least one)
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T.../B.../...
WEBHOOK_URL=https://your-soar-platform.example.com/alerts

# Polling settings
ALERT_INTERVAL=60      # Seconds between evaluation cycles
LOOKBACK=300           # Seconds of history to evaluate (5 minutes)

2. Run the notifier

# Continuous polling (recommended for lab use)
bash scripts/alert_notifier.sh

# Single evaluation run (useful for cron integration)
bash scripts/alert_notifier.sh --once

# Dry run — evaluate and print without sending notifications
bash scripts/alert_notifier.sh --dry-run --once

3. Generate test events

Run an attack simulation to trigger alerts:

bash scripts/brute_force_simulation.sh
# Wait 30-60 seconds, then check for notifications

Alert Rule Format

Rules are YAML files in alerts/rules/. The notifier currently uses built-in evaluator functions rather than parsing YAML at runtime; rule files serve as human-readable configuration documentation and future integration targets.

name: rule_identifier           # Unique snake_case name
enabled: true                   # Set false to suppress without deleting
description: What this detects  # Human-readable description

# Elasticsearch query
query:
  index: "soc-lab-*"
  filter:                       # All conditions must match (AND)
    - term:
        field.name: value
  any_of:                       # At least one must match (OR)
    - term:
        field.name: value

# Threshold configuration
aggregation:
  group_by: source.ip           # Field to aggregate by
  metric: count                 # count | sum | avg
  threshold: 10                 # Alert when metric exceeds this
  timeframe: 5m                 # Evaluation window

# Alert metadata
alert:
  severity: high                # informational | low | medium | high | critical
  mitre:
    - T1110                     # MITRE ATT&CK technique IDs
  runbook: docs/ALERTING.md#runbook-name
  message: |
    Template with {field} substitutions.

# Notification channels
notify:
  - slack
  - webhook

Notification Backends

Slack

Sends a formatted message with color-coded severity to a Slack channel.

Setup:

Create a Slack app at https://api.slack.com/apps
Enable Incoming Webhooks
Copy the webhook URL to SLACK_WEBHOOK_URL in .env

Severity colors:

Critical → Red (#FF0000)
High → Orange (#FF6600)
Medium → Yellow (#FFCC00)
Low/Info → Green (#36A64F)

Generic Webhook

Posts a JSON payload to any HTTP endpoint. Compatible with:

PagerDuty Events API
Opsgenie
Custom SOAR platforms
Discord webhooks (use Slack format for Discord)
Microsoft Teams incoming webhooks

Payload format:

{
    "alert": {
        "rule": "brute_force_authentication",
        "severity": "high",
        "message": "50 authentication failures...",
        "mitre": ["T1110", "T1110.001"],
        "timestamp": "2026-02-27T14:30:00Z",
        "source": "soc-lab-docker"
    }
}

Email (DIY Extension)

Email notifications are not built-in. To add email support, extend the notify() function in scripts/alert_notifier.sh:

send_email() {
    local subject="[SOC Alert] ${1}"
    local body="$3"
    echo "$body" | mail -s "$subject" "$ALERT_EMAIL"
}

Requires mailutils or sendmail configured on the host.

Adding a New Alert Rule

To add a new detection to the polling loop:

Create alerts/rules/<name>.yml with the rule definition
Add an eval_<name>() function in scripts/alert_notifier.sh
Call the new function inside run_evaluation()

Example:

eval_new_technique() {
    local count
    count=$(es_count "soc-lab-*" \
        '{"bool":{"must":[{"term":{"event.category":"process"}},{"term":{"process.name":"suspicious.exe"}}]}}')

    log "  new_technique: ${count} events in last ${LOOKBACK}s"

    if (( count >= 1 )); then
        notify \
            "new_technique" \
            "high" \
            "${count} suspicious.exe events detected." \
            "T1XXX" \
            "slack,webhook"
    fi
}

Incident Response Playbooks

Each section below is a step-by-step response guide for alerts generated by the built-in rules. Playbooks follow the structure:

Triage — determine if the alert is a true positive
Investigation — gather additional context
Containment — stop ongoing harm
Eradication — remove the threat
Recovery — restore normal operations
Documentation — capture findings

Runbook: Brute Force Authentication

Trigger: brute_force_authentication — more than 10 authentication failures from a single source IP within 5 minutes.

Severity: High

MITRE: T1110, T1110.001, T1110.003

Triage

Is the source IP internal or external?
Is the IP a known scanner, jump box, or monitoring system?
Is the targeted account a service account or human account?
Did any authentication succeed from this IP?

KQL Investigation Query (Kibana)

event.category: "authentication" AND source.ip: "<ATTACKER_IP>"
| sort @timestamp desc

Look for:

event.outcome: success — credential found (critical escalation)
Multiple user.name values — password spray pattern
Geographic anomaly in source.geo.country_name

SPL Investigation (Splunk)

index=soc-lab-* event.category=authentication source.ip="<ATTACKER_IP>"
| table _time, user.name, event.outcome, host.name, authentication.service
| sort _time

Containment

Block source.ip at the perimeter firewall or WAF
If an account was compromised (success after failures):
- Immediately disable the account
- Force password reset
- Revoke active sessions

Eradication

Review all authentication events from source.ip in the past 7 days
Check for any resources accessed by compromised accounts
Reset all accounts that were targeted (not just the successfully brute-forced one)

Recovery

Re-enable accounts after password resets are confirmed
Enable MFA for targeted accounts if not already active
Monitor for re-attempts from the same IP or IP ranges

Documentation

Record: source IP, targeted accounts, first/last seen timestamps, whether a success event occurred, actions taken, and any indicator of compromise (IOC).

Runbook: PsExec Lateral Movement

Trigger: psexec_lateral_movement — any PSEXESVC.exe process event.

Severity: High

MITRE: T1021.002, T1570, T1543.003

Triage

Is PSEXESVC.exe expected on this host? (authorized IT admin use?)
What account spawned it? Is that account authorized for remote admin?
Are there related SMB connections to port 445 from the same source?
Did the PsExec spawn a cmd.exe or PowerShell child?

KQL Investigation

host.name: "<TARGET_HOST>" AND (process.name: "PSEXESVC.exe" OR process.parent.name: "PSEXESVC.exe")
| sort @timestamp asc

Sequence Analysis

event.category: "network" AND destination.port: 445 AND destination.ip: "<TARGET_HOST_IP>"

Correlate the SMB connection time with the PSEXESVC.exe start time.

Containment

Isolate the target host from the network immediately
Block the source IP at the internal network level
Disable the account used for remote execution

Eradication

Remove the PSEXESVC.exe binary from the ADMIN$ share
Remove any dropped payloads or persistence mechanisms
Audit all processes started by PSEXESVC.exe in the incident window

Recovery

Reimage the host if unauthorized code executed at SYSTEM level
Restore from backup if files were modified
Re-enable the host after validation

Runbook: Large Data Transfer

Trigger: large_data_transfer — more than 10 outbound connections on non-standard ports to external IPs within 10 minutes.

Severity: Medium

MITRE: T1041, T1048, T1048.003

Triage

What application is making the connection? Is it a known business app?
Is the destination IP a known cloud service, CDN, or corporate system?
Are archiving tools (7z, rar, tar) also running on the same host?
Is the connection count/volume anomalous compared to the baseline?

KQL Investigation

event.category: "network" AND host.name: "<HOST>" AND NOT destination.port: (80 OR 443)
| sort @timestamp asc

Check for:

source.bytes — volume of data sent
Consistent destination.ip — possible C2 channel
Regular timing intervals — beaconing pattern

Containment

Block the destination IP at the perimeter
Isolate the host if data exfiltration is confirmed

Eradication

Identify what data was transferred (audit file access logs before transfer)
Remove the exfiltration mechanism (malware, compromised script)

Recovery

Notify data owner and legal/compliance if PII or regulated data was involved
File an incident report per data breach notification requirements

Runbook: WMI Remote Execution

Trigger: wmi_remote_execution — WmiPrvSE.exe spawning cmd.exe or powershell.exe.

Severity: High

MITRE: T1047, T1021

Triage

Is the shell spawn associated with a known WMI management agent?
What command was executed? (check process.command_line)
Is there a corresponding network connection to port 135 from a remote host?

KQL Investigation

process.parent.name: "WmiPrvSE.exe" AND process.name: ("cmd.exe" OR "powershell.exe")

Containment

Block DCOM/WMI traffic (port 135) from unauthorized hosts
Disable remote WMI on affected hosts temporarily

Runbook: Scheduled Task Persistence

Trigger: scheduled_task_persistence — schtasks.exe with creation flags and LOLBin payload references.

Severity: Medium

MITRE: T1053.005

Triage

Is the task created by a software installer or IT automation tool?
Does the task action reference an unusual path (ProgramData, AppData)?
Does the task name mimic a legitimate Windows task?

KQL Investigation

process.name: "schtasks.exe" AND process.command_line: "/create"

Review task name (/tn), action (/tr), and trigger (/sc).

Containment

Delete the malicious scheduled task:
```
schtasks /delete /tn "<TASK_NAME>" /f
```
Remove any payload binaries referenced in the task action

Eradication

Audit all scheduled tasks on the affected host:
```
schtasks /query /fo LIST /v
```
Remove all unauthorized tasks

Tuning Guidance

Reducing False Positives

Build an allow-list for known-good source IPs, accounts, and hosts that trigger rules legitimately (security scanners, IT admin jump boxes)
Increase thresholds if baseline failure rates in your environment are higher than the defaults
Add time-of-day filters to alert only on after-hours activity for rules that primarily catch human attackers (not automated)
Stack rank alerts — correlate multiple low-confidence signals before sending a high-severity notification

Reducing False Negatives

Lower thresholds for high-value assets (domain controllers, file servers)
Add geolocation filters to alert on authentications from unexpected countries
Combine network + process events for higher-confidence correlated alerts
Monitor baseline drift — update thresholds quarterly

Cron Integration

To run the notifier on a schedule without leaving a persistent process:

# Evaluate alert rules every 5 minutes
*/5 * * * * cd /path/to/soc-lab-docker && bash scripts/alert_notifier.sh --once >> /var/log/soc-lab-alerts.log 2>&1

FilesExpand file tree

ALERTING.md

Latest commit

History

ALERTING.md

File metadata and controls

Alerting Framework

Overview

Prerequisites

Quick Start

1. Configure credentials

2. Run the notifier

3. Generate test events

Alert Rule Format

Notification Backends

Slack

Generic Webhook

Email (DIY Extension)

Adding a New Alert Rule

Incident Response Playbooks

Runbook: Brute Force Authentication

Triage

KQL Investigation Query (Kibana)

SPL Investigation (Splunk)

Containment

Eradication

Recovery

Documentation

Runbook: PsExec Lateral Movement

Triage

KQL Investigation

Sequence Analysis

Containment

Eradication

Recovery

Runbook: Large Data Transfer

Triage

KQL Investigation

Containment

Eradication

Recovery

Runbook: WMI Remote Execution

Triage

KQL Investigation

Containment

Runbook: Scheduled Task Persistence

Triage

KQL Investigation

Containment

Eradication

Tuning Guidance

Reducing False Positives

Reducing False Negatives

Cron Integration

Related Documentation