Skip to content

[RFE] Add a Cloud Release Command that leverages dayzero plugin structure #688

Description

@sadsfae

Add the ability for users to add and manage their own "cloud release command".

Cloud Release Command - Implementation Plan

Context

After a cloud environment completes the 12-stage move pipeline and is released, users want the ability to automatically run a custom command on the first host in their allocation. This extends the dayzero plugin system with a new per_cloud plugin called cloudcmd that executes a user-provided command via SSH inside a detached tmux session.

The command is stored per-user on their SSO profile page (similar to SSH public keys) and is applied to any assignment where they are the cloud-owner. Output is appended to /root/quads_deployed.txt (the existing moveinfo log) for auditability.

Research Findings

tmux over screen/tmate

  • tmux: Ships in RHEL base repos (AppStream). Red Hat deprecated screen in RHEL 8. tmate requires network access to a tmate server (disqualified for air-gapped datacenters).
  • tmux syntax: tmux new-session -d -s quads_release 'command'
  • tmux handles direct exec without forced sh -c wrapper, simplifying quoting.

Sanitization strategy

  • Base64 encode the user command for safe passage through SSH + tmux (established pattern in ssh_helper.py:93).
  • Blocklist dangerous patterns (not allowlist) since users are trusted internal admins.
  • 1024 character limit is appropriate. After base64 expansion (~33%) and tmux wrapping, total stays under 2KB, well within paramiko/SSH limits.
  • Strip control characters (NULL, ESC, CR, BS) to prevent terminal injection.

Existing patterns to follow

  • SSH key storage: User.ssh_key = Column(Text, nullable=True) in models.py
  • SSH key profile UI: textarea form in auth/profile.html with server-side validation
  • SSH key validation: prefix-based allowlist in auth.py and users.py
  • SSH key injection: SSHHelper.distribute_ssh_keys() uses base64 encoding via paramiko
  • Host ordering: sorted(hosts, key=lambda k: k.name) - alphabetically first hostname is "first host"
  • Cloud owner: assignment.owner field maps to User via {owner}@{domain} email lookup

Part 1: Data Model

1a. User model

File: src/quads/server/models.py

Add after the existing ssh_key column (~line 279):

release_command = Column(String(1024), nullable=True)

Uses String(1024) (not Text) to enforce the limit at the database level.

1b. Alembic migration

New migration: migrations/versions/<rev>_add_user_release_command.py

op.add_column("users", sa.Column("release_command", sa.String(1024), nullable=True))

Part 2: Command Validation

2a. Validation module

File: src/quads/server/blueprints/users.py

Add validation function alongside the existing _validate_ssh_key():

RELEASE_CMD_MAX_LENGTH = 1024

BLOCKED_PATTERNS = [
    r'(?:^|[;&|]\s*)rm\s+.*-[rR].*\s+/',
    r':\(\)\s*\{\s*:\|:\s*&\s*\}\s*;',       # fork bomb
    r'(?:^|[;&|]\s*)dd\s+.*of=/dev/',
    r'(?:^|[;&|]\s*)mkfs',
    r'(?:^|[;&|]\s*)shutdown',
    r'(?:^|[;&|]\s*)reboot',
    r'(?:^|[;&|]\s*)halt',
    r'(?:^|[;&|]\s*)poweroff',
    r'(?:^|[;&|]\s*)init\s+[06]',
    r'(?:^|[;&|]\s*)systemctl\s+reboot',
]

def _validate_release_command(command):
    if not command:
        return None
    if len(command) > RELEASE_CMD_MAX_LENGTH:
        return f"Command exceeds {RELEASE_CMD_MAX_LENGTH} character limit."
    cleaned = re.sub(r'[^\x20-\x7e\n]', '', command)
    if cleaned != command:
        return "Command contains invalid control characters."
    for pattern in BLOCKED_PATTERNS:
        if re.search(pattern, command, re.IGNORECASE):
            return "Command contains a blocked operation."
    return None

Command-position anchoring (checking after ^, ;, &, |) prevents false positives on strings like echo reboot or grep shutdown.

2b. Web auth blueprint validation

File: src/quads/web/blueprints/auth.py

New route update_release_command following the update_ssh_key pattern. Calls the same _validate_release_command() logic from the API layer (or duplicates the validation inline, matching the existing SSH key pattern where web and API both validate independently).

2c. API users endpoint

File: src/quads/server/blueprints/users.py

The existing PATCH endpoint already uses setattr(user, key, value) for arbitrary fields. Add validation before the update:

if "release_command" in data:
    error = _validate_release_command(data["release_command"])
    if error:
        return make_response(jsonify({"error": "Bad Request", "message": error}), 400)

Part 3: Profile UI

3a. Profile template

File: src/quads/web/templates/auth/profile.html

Add after the SSH Public Key card (after line ~144). Uses a collapsible Bootstrap accordion so the feature does not crowd the page:

<div class="card mb-4">
    <div class="card-header p-0">
        <button class="btn btn-link text-decoration-none w-100 text-start p-3 collapsed"
                type="button"
                data-bs-toggle="collapse"
                data-bs-target="#releaseCommandCollapse"
                aria-expanded="false">
            <span class="d-flex align-items-center">
                <svg class="me-2 collapse-arrow" ...><!-- chevron icon --></svg>
                <h5 class="mb-0">Cloud Release Command</h5>
                <span class="text-muted ms-2 small">(optional)</span>
            </span>
        </button>
    </div>
    <div id="releaseCommandCollapse" class="collapse">
        <div class="card-body">
            <p class="text-muted">
                An optional command to run on the first host of your
                allocations after release. Executes in a detached tmux
                session as root. Maximum 1024 characters.
            </p>
            <form method="POST" action="{{ url_for('auth.update_release_command') }}">
                <input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
                <div class="mb-2">
                    <textarea class="form-control font-monospace"
                              name="release_command"
                              rows="3"
                              maxlength="1024"
                              placeholder="e.g. source /home/user/.env ; start_service">{{ release_command or '' }}</textarea>
                    <div class="form-text">
                        <span id="cmdCharCount">0</span>/1024
                    </div>
                </div>
                <button type="submit" class="btn btn-primary btn-sm">Save Command</button>
            </form>
        </div>
    </div>
</div>

Small JS snippet for live character count (follows existing profile page patterns).

The collapse arrow rotates on expand/collapse via CSS transition on the .collapse-arrow class, matching Bootstrap 5 accordion conventions.

3b. Profile route

File: src/quads/web/blueprints/auth.py

Add route for saving the release command, following the update_ssh_key pattern:

@auth_bp.route("/profile/release-command", methods=["POST"])
@login_required
def update_release_command():
    command = request.form.get("release_command", "").strip()
    error = _validate_release_command(command)
    if error:
        flash(error, "danger")
        return redirect(url_for("auth.profile"))
    quads = QuadsApi(Config)
    try:
        quads.update_release_command(current_user.email, command or None)
        flash("Release command saved." if command else "Release command removed.", "success")
    except Exception:
        logger.exception("Failed to update release command")
        flash("Failed to update release command.", "danger")
    return redirect(url_for("auth.profile"))

3c. Profile route data

File: src/quads/web/blueprints/auth.py

In the existing profile() route, pass release_command to the template alongside ssh_key:

release_command = user_data.get("release_command", "")

3d. API client method

File: src/quads/quads_api.py

Add method following the update_ssh_key pattern:

def update_release_command(self, email, command):
    return self.patch(os.path.join("users", email), {"release_command": command})

Part 4: cloudcmd Plugin

4a. Plugin file

File: src/quads/plugins/builtin/dayzero/runonce/cloudcmd.py

class CloudCmdPlugin(DayzeroPlugin):
    name = "cloudcmd"
    version = "1.0.0"
    description = "Run cloud owner release command on first host via tmux"
    author = "QUADS Team"
    run_mode = "per_cloud"

initialize(): Extract domain from global Config (needed for owner->email lookup). Return True.

execute(hosts, cloud, schedule_data_list):

  1. Get the cloud assignment to determine owner:

    quads = QuadsApi(Config)
    cloud_obj = quads.get_cloud(cloud)
    assignment = quads.get_active_cloud_assignment(cloud_obj.name)
    owner = assignment.get("owner")
  2. Look up the owner's release_command:

    domain = Config.get("domain", "")
    user_data = quads.get_user(f"{owner}@{domain}")
    command = user_data.get("release_command")
    if not command:
        self.logger.info(f"No release command set for {owner}, skipping")
        return True
  3. Determine first host (alphabetically sorted, matching quads --cloud-only behavior):

    first_host = sorted(hosts)[0]
  4. Build the tmux SSH command with base64 encoding:

    encoded = base64.b64encode(command.encode("utf-8")).decode("ascii")
    log_file = "/root/quads_deployed.txt"
    tmux_cmd = (
        f"tmux new-session -d -s quads_release "
        f"'echo \"--- Cloud Release Command ---\" >> {log_file} ; "
        f"echo \"$(date -Is)\" >> {log_file} ; "
        f"echo {encoded} | base64 -d | bash >> {log_file} 2>&1'"
    )
  5. SSH to first host using asyncio.create_subprocess_exec (same pattern as moveinfo):

    proc = await asyncio.create_subprocess_exec(
        "ssh", "-o", "StrictHostKeyChecking=no",
        "-o", "ConnectTimeout=10", "-o", "BatchMode=yes",
        f"root@{first_host}", tmux_cmd, ...
    )
  6. Return True/False based on SSH exit code. The tmux session itself runs detached, so SSH returns immediately after spawning it.

4b. API endpoint for user data

File: src/quads/server/blueprints/users.py

Need to ensure the existing GET user endpoint returns release_command. Check the User model's Serialize mixin - if it auto-serializes all columns (which it does based on the Serialize base class), then release_command will be included automatically after the migration.

If not, add release_command to the serialization output.

4c. API client method for user lookup

File: src/quads/quads_api.py

Check if get_user() exists. If not, add:

def get_user(self, email):
    response = self.get(os.path.join("users", email))
    return response.json()

4d. Plugin config

File: conf/plugins.yml

  cloudcmd:
    enabled: true

Part 5: RPM Spec

No changes needed. The cloudcmd.py plugin lives under the existing builtin/dayzero/runonce/ package path which is already packaged as part of the Python site-packages.

tmux should be added as a dependency for the host side, but since QUADS does not manage the host OS packages (Foreman does), this is a documentation note, not a spec change.


Part 6: Tests

6a. Validation tests

File: tests/plugins/test_dayzero.py (or new tests/unit/test_release_command.py)

  • test_validate_release_command_valid - normal command passes
  • test_validate_release_command_empty - empty/None returns no error
  • test_validate_release_command_too_long - 1025 chars rejected
  • test_validate_release_command_control_chars - NULL/ESC rejected
  • test_validate_release_command_blocked_rm_rf - rm -rf / blocked
  • test_validate_release_command_blocked_reboot - reboot blocked
  • test_validate_release_command_allowed_echo_reboot - echo reboot passes (no false positive)
  • test_validate_release_command_blocked_fork_bomb - fork bomb blocked

6b. Plugin tests

File: tests/plugins/test_dayzero.py

  • test_cloudcmd_metadata - name, version, run_mode == "per_cloud"
  • test_cloudcmd_no_command_set - returns True, logs skip
  • test_cloudcmd_first_host_selection - sorted alphabetically
  • test_cloudcmd_ssh_success - mock subprocess, verify tmux command structure
  • test_cloudcmd_ssh_failure - mock failure, verify non-fatal

6c. Profile UI tests

  • Test the new route returns 302 redirect
  • Test invalid command gets flash error
  • Test valid command saves successfully

Files Modified

File Change
src/quads/server/models.py Add release_command column to User
src/quads/server/blueprints/users.py Add _validate_release_command(), validate in PATCH
src/quads/web/blueprints/auth.py Add update_release_command route, pass to template
src/quads/web/templates/auth/profile.html Add collapsible release command card
src/quads/quads_api.py Add update_release_command(), get_user() if missing
conf/plugins.yml Add cloudcmd: enabled: true

Files Created

File Purpose
migrations/versions/<rev>_add_user_release_command.py Alembic migration
src/quads/plugins/builtin/dayzero/runonce/cloudcmd.py Cloud release command plugin
tests/unit/test_release_command.py Validation tests

Verification

  1. pytest tests/plugins/ tests/unit/ -vv - all tests pass
  2. Start dev server, verify profile page shows collapsible command box
  3. Save a command, verify it persists in the database
  4. Test blocklist: try saving rm -rf /, verify rejection
  5. Test that echo reboot is allowed (no false positive)
  6. Mock the full plugin flow: owner has command set, verify tmux SSH call structure
  7. Verify quads_deployed.txt would receive command output (via tmux redirect)

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions