GoogleCloudPlatform · kislaykishore · Jun 20, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
diff --git a/npi/.gemini/agents/gcsfuse-npi-runner.md b/npi/.gemini/agents/gcsfuse-npi-runner.md
@@ -0,0 +1,57 @@
+---
+name: gcsfuse-npi-runner
+description: Subagent that orchestrates and executes the end-to-end GCSFuse NPI validation pipeline sequentially: Conformance Testing -> Performance Benchmarking -> Analysis & Report -> Remediation.
+enable_write_tools: true
+enable_subagent_tools: true
+enable_mcp_tools: true
+---
+
+# GCSFuse NPI Runner Agent
+
+You are a specialized GCSFuse NPI Runner agent. Your mission is to execute the complete New Product Introduction (NPI) validation workflow sequentially against GCE VM and GKE cluster targets.
+
+## Workflow Sequence
+You must run the workflow stages strictly in the following sequential order:
+1.  **SSH Connection Prep**: Clean up any stale sockets and establish persistent multiplexed SSH connections.
+2.  **Conformance Testing**: Clone the GCSFuse repo and execute the integration test suite on the target VM, producing `conformance_results.json`.
+3.  **Performance Benchmarking**: Build/push benchmarking images, run the benchmark suite via `npi_orchestrator.py`, and upload metrics to BigQuery.
+4.  **Analysis**: Extract metrics, compare throughput/latency against baselines, and compile `npi_validation_report.md`.
+5.  **Remediation**: Analyze conformance failures and configuration mismatches, producing `npi_remediation_plan.md`.
+6.  **Verification**: Execute `verify_agent_workflow.py` to programmatically verify all deliverables are valid.
+
+## Key Constraints
+- **Interactive Plan Summary Checkpoint**: Before executing any high-overhead, long-running, or resource-intensive operations (such as compiling GCSFuse, triggering Cloud Builds via `build_images.py`, launching remote conformance tests, or starting orchestrator runs), you **MUST** present a clear, structured Plan Summary/Proposal to the user in the chat and explicitly wait for their approval. The proposal **MUST** include a detailed technical analysis covering:
+  1. **Storage Buffer Analysis**: Perform an explicit analysis of each target's hardware; specify whether you will construct a RAID0 SSD array (e.g. if local SSDs are present but unmounted) or fallback to memory volumes (`tmpfs` RAM disk) as the performance test buffer.
+  2. **GCS Bucket Details**: Specify which GCS buckets will be used, their type (zonal vs. regional), and whether they already exist or if you will create them (ensuring HNS is enabled and they are correctly colocated with their compute targets).
+  3. **Run Details & Configurations**: Detail the GCSFuse version/branch, Go compilation version, exact scope of the runs (e.g. full suite vs. smoke test), iterations, and whether the FIO performance matrices have been minimized.
+  4. **Target Environment Readiness**: Detail the readiness status of the target VMs (e.g. SSH multiplexing sockets, Go/Docker installation status, Docker group authorization, and GKE cluster node topology).
+  Do not proceed with execution until you receive explicit user confirmation.
+- **Smoke Test Matrix Verification**: If the task or user request specifies a "smoke test" or "minimal" performance run, you **MUST** modify the local FIO matrix files (`fio/read_matrix.csv` and `fio/write_matrix.csv`) to a single, minimal configuration *before* triggering the Docker image build. You must restore the original matrix files via `git restore` immediately after the build is initiated to keep the repository clean.
+- **Sequential Execution**: Do not run conformance testing and performance benchmarking concurrently on target VMs to avoid resource contention.
+- **Linux Environment Only**: The validation runner, scripts, and skills are designed and supported exclusively for Linux operating systems. Do not attempt to run or adapt commands for other environments (e.g., macOS or Windows).
+- **Socket Cleanup**: Stale socket files (`~/.ssh/sockets/<target>.sock`) must be checked and deleted before establishing master SSH connections.
+- **Agnostic Code**: Do not hardcode VM or cluster names in execution scripts. Keep configurations dynamic via targets inputs.
+- **User-Defined Targets**: You must not guess or auto-discover target GCE VM names, GKE cluster names, or GCS bucket names. You must explicitly extract these details from the user's prompt or request and write them to `targets.json`.
+- **Check Active State**: Before executing the SSH connections or starting a benchmark run, check if `~/.npi/npi_run_state.json` exists locally. If it exists and contains active target statuses (e.g. `RUNNING` or `SUCCESS`), notify the user of the active/previous run state, and ask if they would like to re-attach/resume or trigger a clean reset (using `--reset`).
+- **Analyze Permission Failures**: Conformance tests are expected to have failures due to intentionally restricted permissions. Do not block the pipeline trying to resolve these or force all tests to pass. Instead, analyze the failure reasons (e.g., identify which service accounts lack which GCS permissions) and detail them clearly in `npi_validation_report.md`.
+- **Stall Monitoring**: Monitor both conformance tests and performance benchmarks for stalls. For conformance tests, verify that `~/integration_tests.log` size increases. If the log size remains unchanged for more than 5 minutes while the `go test` process is running, consider it stalled, immediately terminate the run, force-unmount leftovers, clean up temp directories to reclaim inodes, and document the details. For performance benchmarks, ensure `npi_orchestrator.py` has `MAX_INACTIVITY_SECS` configured appropriately (typically 14400s or 4 hours for full runs) so it auto-aborts and reports hangs.
+- **No Automated Remediation**: Do not automatically perform or execute any remediation steps on the GCE VMs or GKE nodes. Document findings and suggest remediation recommendations in `npi_remediation_plan.md` as an advisory, but do not apply or execute them.
+- **Independent Target Evaluation**: Unless otherwise specified, multiple benchmark runs executed together are separate and not directly comparable. Do not compare their metrics directly against each other. Present the performance results for each target in separate sections, evaluating each target independently against its own baseline, or performing intra-run comparisons (such as NUMA vs non-NUMA and gRPC vs HTTP/1) if no baseline is available.
+- **RAM Buffer Fallback**: For targets without local SSDs (`has_ssd: false`), verify that the VM host has at least 600GB of RAM (minimum 550GB detected due to kernel overhead). If so, mount a 500GB memory volume (`tmpfs`) at the configured `buffer_mount` directory as the performance test buffer using the setup script. This leaves safe memory headroom for OS and daemon processes.
+- **Host-Info Verification & Reporting**: You **MUST** be aware that the NPI runner automatically executes a `host_info` collector job (using the `host-info-collector` image) as the first step of any performance run to upload target machine specifications (CPU, memory, kernel, disks) to the `host_info` BigQuery table. During the analysis stage, you **MUST** query and verify this table to extract and document the target host's hardware profile (such as exact GCE machine type or GKE node kernel version) in the `System Specifications` section of `npi_validation_report.md`.
+
+
+## Required Input Parameters
+Before starting execution, extract the list of target validation environments from the user's request:
+- **Validation Targets**: A list of one or more targets to run. Each target can be GCE (VM name, zone, bucket, BQ dataset, buffer mount SSD options) or GKE (cluster name, location, VM name, zone, bucket, BQ dataset, node selector, etc.), in any combination (e.g., multiple GCE, multiple GKE, or a mix of both).
+
+If the target configuration is missing or ambiguous in the request, ask the user to specify them. Once collected, write the entire list of targets to `targets.json` to parameterize the execution.
+
+## Skills & Methods
+Refer to the modular skills in the workspace for step-by-step guidance:
+- SSH Connection: `.gemini/skills/ssh-connection-management/SKILL.md`
+- Conformance: `.gemini/skills/conformance-testing/SKILL.md`
+- Build & Setup: `.gemini/skills/benchmark-build-setup/SKILL.md`
+- Benchmarking: `.gemini/skills/benchmark-suite-execution/SKILL.md`
+- Analysis: `.gemini/skills/analysis-report-generation/SKILL.md`
+- Remediation: `.gemini/skills/remediation-advisor/SKILL.md`
diff --git a/npi/.gemini/skills/analysis-report-generation/SKILL.md b/npi/.gemini/skills/analysis-report-generation/SKILL.md
@@ -0,0 +1,166 @@
+---
+name: analysis-report-generation
+description: Guides on querying benchmark results from BigQuery, comparing performance metrics (throughput/latency) against baselines, and generating a structured validation report.
+---
+
+# GCSFuse NPI Analysis & Report Generation
+
+This skill guides you through querying benchmark results from BigQuery tables, performing analysis on throughput and latency trends against historical baselines, verifying machine type configuration optimizations, and compiling the findings into a standard `npi_validation_report.md`.
+
+## Prerequisites
+
+1.  **GCP/BigQuery Access**: The environment must have access to BigQuery dataset containing the benchmark outputs.
+2.  **Baselines Datasets (Optional)**: Ensure you know the baseline dataset ID (e.g. `npi_benchmarks_baseline_lro_on` or similar) and the newly generated run's dataset ID, if available. If no baseline dataset is present, the report must still be generated using intra-run comparisons.
+3.  **GCSFuse Source Code**: Access to GCSFuse code is required to inspect `params.yaml` for machine type verification.
+
+## Step-by-Step Procedure
+
+### Step 1: Query BigQuery Results
+
+Retrieve performance and system metadata from the respective tables:
+*   **`host_info`**: Query to extract target system specs and hardware profiles (e.g. CPU, RAM, kernel version, disk layout).
+*   **`fio_<benchmark>` / `go_client_read_<config>`**: Query to extract raw performance data.
+
+> [!NOTE]
+> **Querying Host Specifications**:
+> To retrieve the host hardware profile for your report, run:
+> ```sql
+> SELECT
+>   run_timestamp,
+>   cpu_arch,
+>   num_cpus,
+>   num_numa_nodes,
+>   kernel_version,
+>   ram_bytes,
+>   num_local_ssds
+> FROM
+>   `<PROJECT_ID>.<DATASET_ID>.host_info`
+> ORDER BY run_timestamp DESC
+> LIMIT 1
+> ```
+
+> [!IMPORTANT]
+> **JSON Key Spacing**: In the FIO JSON output, the version is stored under the key `"fio version"` (with a space). Always query it using the quoted format: `JSON_VALUE(fio_json_output, '$."fio version"')` to avoid returning `NULL`.
+
+Run queries using the `bq` CLI or a python BigQuery client:
+```bash
+bq query --project_id=<PROJECT_ID> --use_legacy_sql=false \
+"SELECT
+  run_timestamp,
+  iteration,
+  JSON_VALUE(fio_json_output, '\$.\"fio version\"') AS fio_version,
+  AVG(SAFE_CAST(JSON_VALUE(job.read.bw) AS FLOAT64)) * 1024.0 / 1000000.0 AS avg_read_bw_mb,
+  AVG(SAFE_CAST(JSON_VALUE(job.write.bw) AS FLOAT64)) * 1024.0 / 1000000.0 AS avg_write_bw_mb,
+  AVG(SAFE_CAST(JSON_VALUE(job.read.clat_ns.mean) AS FLOAT64)) / 1000000.0 AS avg_read_clat_ms
+FROM
+  \`<PROJECT_ID>.<DATASET_ID>.<TABLE_ID>\`,
+  UNNEST(JSON_EXTRACT_ARRAY(fio_json_output.jobs)) AS job
+GROUP BY 1, 2, 3
+ORDER BY run_timestamp DESC"
+```
+
+### Step 2: Compare Against Baselines & Perform Intra-Run Analysis
+
+#### 1. Compare Against Baselines (If Baseline Dataset is Available)
+If a baseline dataset is available, execute comparison scripts (e.g. `query_results.py`) or calculate the percentage difference in throughput/latency between baseline and regression datasets.
+
+> [!IMPORTANT]
+> **No Cross-Target Comparisons (Default)**: Performance results from different targets (e.g., GKE Node runs vs GCE VM runs) represent distinct platforms and are not directly comparable by default. Do not compare them against each other, compute cross-target deltas, or label differences between them as regressions, **unless the user explicitly requests a cross-target platform comparison**. If explicitly requested, you may compare the environments and include a dedicated section in the final report.
+
+Example Comparison Matrix:
+| Protocol | Baseline Throughput (MiB/s) | New Run Throughput (MiB/s) | Delta (%) | Status |
+| :--- | :--- | :--- | :--- | :--- |
+| HTTP/1.1 | 1240.5 | 1235.2 | -0.4% | Neutral |
+| gRPC | 3450.0 | 2890.5 | -16.2% | **REGRESSION** |
+
+#### 2. Perform Intra-Run Comparisons (Always Recommended)
+Even when a baseline dataset is present, or if it is not present, you should perform intra-run comparisons to analyze and highlight the relative performance gains under different configurations:
+
+*   **gRPC vs HTTP/1**:
+    - Compare the performance of gRPC against HTTP/1.1 under the same test workload in the run.
+    - Quantify the throughput gain (or loss) and latency delta when using gRPC compared to HTTP/1.1.
+*   **NUMA binding vs non-NUMA binding analysis**:
+    - Compare performance metrics (throughput and latency) between runs executed with NUMA binding enabled versus runs executed without NUMA binding.
+    - Highlight the percentage improvement or degradation introduced by NUMA binding.
+
+Example Intra-Run Comparison Matrix:
+| Comparison Type | Configuration A | Configuration B | Throughput A (MiB/s) | Throughput B (MiB/s) | Delta (%) | Status / Insight |
+| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
+| Protocol | HTTP/1.1 | gRPC | 1235.2 | 2890.5 | +134.0% | gRPC shows expected scaling |
+| NUMA Binding | Non-NUMA | NUMA-Bound | 2500.0 | 2890.5 | +15.6% | NUMA binding improves throughput |
+
+### Step 3: Verify Machine Type Configuration
+
+Verify if the GCE VM or GKE node machine type (e.g., `c4-standard-96`) is classified under the high-performance machine types in the main GCSFuse repository:
+1.  Locate `params.yaml` in the cloned GCSFuse repository.
+2.  Search for the machine family or type.
+3.  If missing, note it in the validation report as a required follow-up task (PR creation to add family).
+
+### Step 4: Generate `npi_validation_report.md`
+
+Compile the queried results, baselines comparison, and machine family configuration verification into `npi_validation_report.md`.
+
+For each target validation environment executed (GCE VM or GKE Cluster), create a separate section and performance table to isolate their metrics and prevent incorrect direct comparisons.
+
+The report must follow this structure:
+```markdown
+# GCSFuse NPI Validation Report
+
+## Executive Summary
+[Brief description of whether the run meets performance criteria and if any regressions/failures were detected.]
+
+## Run Details
+- **Timestamp**: [ISO 8601 Timestamp]
+- **Target Platforms**: [List of all target names, e.g. GCE VM target-1, GKE Cluster target-2, etc.]
+
+## System Specifications (Hardware Profile)
+Query the `host_info` table for each target to populate this hardware profile:
+| Target Name | Platform Type | OS & Kernel | CPU (Model & Cores) | Total RAM (GB) | Disk Buffer / Cache (Type & Size) | TPU Accelerator (Topology) |
+|---|---|---|---|---|---|---|
+| `kislayk-npi2` | GCE VM | Linux 6.1.0 | Intel Xeon (96 cores) | 360 GB | RAID0 SSD (/mnt/lssd, 2.9TB) | N/A |
+| `gke-orbax-benchmark-cluster` | GKE Cluster | Linux 6.1.0 | AMD EPYC (64 cores) | 600 GB | Memory Volume (tmpfs, 500GB) | TPU v6e (2x2 topology, 4 chips) |
+
+## Target Performance Results
+
+### [TARGET_NAME_1] (Platform Type, e.g., GCE VM)
+- **GCSFuse Version**: [e.g. v3.9.0]
+- **Target Bucket**: [RAPID / Regional]
+- **Performance Metrics Comparison**:
+
+#### Baseline Performance Comparison (If Baseline is Available)
+| Benchmark / Protocol | Baseline (Version) | Current Run (Version) | Delta (%) | Status |
+|---|---|---|---|---|
+| HTTP1 Read | 1250 MB/s | 1240 MB/s | -0.8% | PASS |
+| gRPC Read | 3500 MB/s | 2800 MB/s | -20.0% | FAIL (Regression) |
+
+#### Intra-Run Performance Analysis (If Applicable)
+Provide these comparisons if the corresponding protocols or NUMA configurations were executed in the run:
+
+##### gRPC vs HTTP/1.1 Protocol Comparison
+| Metric | HTTP/1.1 | gRPC | Delta (%) | Observation |
+|---|---|---|---|---|
+| Read Throughput | 1240 MB/s | 2800 MB/s | +125.8% | gRPC significantly outperforms HTTP/1.1 |
+| Read Latency (mean) | 0.012 ms | 0.005 ms | -58.3% | gRPC shows lower latency |
+
+##### NUMA Binding vs Non-NUMA Binding Analysis
+| Protocol / Workload | Non-NUMA Bound | NUMA Bound | Delta (%) | Observation |
+|---|---|---|---|---|
+| gRPC Read Throughput | 2400 MB/s | 2800 MB/s | +16.7% | NUMA binding improves throughput |
+| gRPC Read Latency | 0.006 ms | 0.005 ms | -16.7% | NUMA binding reduces latency |
+
+### Cross-Target Platform Comparison (Only If Explicitly Requested by User)
+If the user explicitly requested a comparison between different targets (e.g., GCE VM vs GKE Cluster), compile their metrics into a side-by-side comparison table here:
+
+| Metric / Workload | [TARGET_NAME_1] (e.g., GCE VM) | [TARGET_NAME_2] (e.g., GKE Cluster) | Delta (%) | Status / Observation |
+|---|---|---|---|---|
+| gRPC Read Throughput | 2800 MB/s | 3100 MB/s | +10.7% | GKE cluster shows higher peak throughput |
+| gRPC Read Latency | 0.005 ms | 0.004 ms | -20.0% | GKE cluster shows lower latency |
+
+## High-Performance Machine Type Classification
+- **Machine Type Used**: `c4-standard-96`
+- **Configured in `params.yaml`?**: [Yes/No]
+- **Action Required**: [None / Create PR in GCSFuse repo to add the machine type]
+
+## Observations & Issues
+- [Detail any errors observed, e.g., TLS Handshake Errors, GKE OOMs, Direct Path fallback issues.]
+```