GoogleCloudPlatform · kislaykishore · Apr 10, 2026 · Apr 10, 2026 · Apr 10, 2026 · Apr 10, 2026
diff --git a/npi/gce_npi.md b/npi/gce_npi.md
@@ -0,0 +1,93 @@
+# Running NPI Benchmarks on GCE
+
+This guide explains how to build the required Docker images and run the NPI (Network Performance Improvement) benchmarks on a Google Compute Engine (GCE) VM.
+
+## Prerequisites
+
+1.  **Google Cloud Project**: You need a GCP project to host your Artifact Registry, Cloud Storage bucket, and BigQuery dataset.
+2.  **VM Setup**: A GCE VM with Docker installed. The VM's service account must have the following scopes:
+    *   `https://www.googleapis.com/auth/bigquery`
+    *   `https://www.googleapis.com/auth/devstorage.read_write`
+    *   `https://www.googleapis.com/auth/cloud-platform` (or granular scopes)
+3.  **Authentication**: If not using a VM service account, you must authenticate using `gcloud auth login` and `gcloud auth application-default login`.
+4.  **BigQuery Dataset**: Create a BigQuery dataset to store the benchmark results.
+5.  **Artifact Registry**: You must have an Artifact Registry repository to host the Docker images.
+    ```bash
+    gcloud services enable artifactregistry.googleapis.com --project=YOUR_PROJECT_ID
+    gcloud artifacts repositories create gcsfuse-benchmarks \
+        --repository-format=docker \
+        --location=us \
+        --project=YOUR_PROJECT_ID
+    ```
+
+## Step 1: Build the Benchmark Images
+
+The NPI benchmarks use Docker containers to provide an isolated environment for running tests. You can build and push these images to your Artifact Registry using the provided `Makefile`.
+
+By default, the `Makefile` builds for project `gcs-fuse-test`. You should either edit the `Makefile` or pass your project ID to the `make` command:
+
+```bash
+cd gcsfuse-tools/npi
+
+# Option 1: Override project and version in the make command
+make build PROJECT=YOUR_PROJECT_ID GCSFUSE_VERSION=v3.5.6
+
+# Option 2: Edit Makefile directly to change the PROJECT variable, then run:
+make build
+```
+
+This will trigger a Cloud Build job (`cloudbuild.yaml`) that builds the `fio-read-benchmark`, `fio-write-benchmark`, and `orbax-emulated-benchmark` images and pushes them to `us-docker.pkg.dev/YOUR_PROJECT_ID/gcsfuse-benchmarks/`.
+
+## Step 2: Run the Benchmarks
+
+Once the images are built, you can use the `npi.py` script to orchestrate the benchmark runs. The script automatically generates the correct `docker run` commands based on the desired benchmarks and configurations.
+
+### Basic Usage
+
+Run all available benchmarks:
+
+```bash
+python3 npi.py \
+    --benchmarks 'all' \
+    --bucket-name YOUR_GCS_BUCKET \
+    --project-id YOUR_PROJECT_ID \
+    --bq-dataset-id YOUR_BQ_DATASET_ID \
+    --gcsfuse-version v3.5.6
+```
+
+### Specifying Benchmarks
+
+You can specify a space-separated list of specific benchmarks to run. For example, to run only the `read_http1` and `write_grpc` benchmarks:
+
+```bash
+python3 npi.py \
+    --benchmarks read_http1 write_grpc \
+    --bucket-name YOUR_GCS_BUCKET \
+    --project-id YOUR_PROJECT_ID \
+    --bq-dataset-id YOUR_BQ_DATASET_ID \
+    --gcsfuse-version v3.5.6
+```
+
+### Understanding the Parameters
+
+*   `--benchmarks`: 'all' or a list of specific benchmarks (e.g., `read_http1`, `orbax_read_grpc_numa0_fio_bound`).
+*   `--bucket-name`: The GCS bucket where FIO will read/write data.
+*   `--project-id`: The GCP Project ID containing your BigQuery dataset.
+*   `--bq-dataset-id`: The BigQuery dataset where results will be inserted.
+*   `--gcsfuse-version`: The version tag used when building the images (e.g., `v3.5.6`).
+*   `--iterations`: (Optional) Number of FIO test iterations. Default is 5.
+*   `--temp-dir`: (Optional) FUSE temp directory type (`boot-disk` or `memory`).
+
+### Dry Run
+
+To see exactly what `docker run` commands the script will execute without actually running them, add the `--dry-run` flag:
+
+```bash
+python3 npi.py \
+    --benchmarks read_http1 \
+    --bucket-name YOUR_GCS_BUCKET \
+    --project-id YOUR_PROJECT_ID \
+    --bq-dataset-id YOUR_BQ_DATASET_ID \
+    --gcsfuse-version v3.5.6 \
+    --dry-run
+```
diff --git a/npi/gke_npi.md b/npi/gke_npi.md
@@ -0,0 +1,157 @@
+# Running NPI Benchmarks on GKE
+
+This guide explains how to build and run the NPI (Network Performance Improvement) benchmarks on Google Kubernetes Engine (GKE). Unlike GCE where `npi.py` automatically orchestrates Docker runs, in GKE we manually define and deploy Pods that use the benchmark images, leveraging the GKE GCS Fuse CSI driver.
+
+## Step 1: Build the Benchmark Images
+
+The process for building the Docker images is identical to GCE. We use Cloud Build via the provided `Makefile`.
+
+1. Ensure your Artifact Registry repository is created:
+    ```bash
+    gcloud services enable artifactregistry.googleapis.com --project=YOUR_PROJECT_ID
+    gcloud artifacts repositories create gcsfuse-benchmarks \
+        --repository-format=docker \
+        --location=us \
+        --project=YOUR_PROJECT_ID
+    ```
+
+2. Build and push the images:
+    ```bash
+    cd gcsfuse-tools/npi
+    make build PROJECT=YOUR_PROJECT_ID GCSFUSE_VERSION=v3.5.6
+    ```
+This creates images such as `us-docker.pkg.dev/YOUR_PROJECT_ID/gcsfuse-benchmarks/fio-read-benchmark-v3.5.6:latest`.
+
+## Step 2: Configure Workload Identity (Permissions)
+
+To allow your GKE Pods to access the GCS bucket and write metrics to BigQuery, you should use **Workload Identity**. This links a Kubernetes Service Account (KSA) to a Google Cloud Service Account (GSA).
+
+1.  **Create a Google Cloud Service Account (GSA):**
+    ```bash
+    gcloud iam service-accounts create benchmark-gsa \
+        --project=YOUR_PROJECT_ID
+    ```
+
+2.  **Grant the necessary roles to the GSA:**
+    The GSA needs permissions to read/write to the GCS bucket and to insert records into BigQuery.
+    ```bash
+    # Grant BigQuery Data Editor
+    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
+        --member "serviceAccount:benchmark-gsa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
+        --role "roles/bigquery.dataEditor"
+
+    # Grant Storage Object Admin (or a more restricted role) on your bucket
+    gcloud storage buckets add-iam-policy-binding gs://YOUR_BUCKET_NAME \
+        --member "serviceAccount:benchmark-gsa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
+        --role "roles/storage.objectAdmin"
-        --role "roles/storage.objectAdmin"
+        --role "roles/storage.objectUser"
-        --role "roles/storage.objectAdmin"
+        --role "roles/storage.objectUser"
+    ```
+
+3.  **Create a Kubernetes Service Account (KSA):**
+    ```bash
+    kubectl create serviceaccount benchmark-ksa \
+        --namespace default
+    ```
+
+4.  **Bind the KSA to the GSA:**
+    ```bash
+    gcloud iam service-accounts add-iam-policy-binding benchmark-gsa@YOUR_PROJECT_ID.iam.gserviceaccount.com \
+        --role roles/iam.workloadIdentityUser \
+        --member "serviceAccount:YOUR_PROJECT_ID.svc.id.goog[default/benchmark-ksa]"
+    ```
+
+5.  **Annotate the KSA:**
+    ```bash
+    kubectl annotate serviceaccount benchmark-ksa \
+        --namespace default \
+        iam.gke.io/gcp-service-account=benchmark-gsa@YOUR_PROJECT_ID.iam.gserviceaccount.com
+    ```
+
+## Step 3: Run the Benchmarks as Pods
+
+In GKE, we don't use `npi.py`. Instead, we deploy Kubernetes Pods. The GCSFuse mounting is handled directly by the **GKE GCS Fuse CSI driver**, and we pass the mount path to the benchmark container.
+
+### Important Considerations for GKE
+
+*   **NUMA Binding**: NUMA binding does not currently make sense in GKE. Exclude any NUMA-bound benchmarks (e.g., skip anything with `numa0` or `numa1` in the name).
+*   **gRPC Benchmarks**: When running gRPC benchmarks, you **must** change the `mountOptions` in the Pod's volume definition to include `"client-protocol=grpc"`.
+*   **BigQuery Parameters**: The container needs `args` specifying where to send the metrics since it no longer inherits them from `npi.py`.
+
+### Sample Pod Specification
+
+Here is a sample `pod.yaml` for running the `read_http1` benchmark on a TPU node:
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: fio-bench-read-http1
+  namespace: default
+  annotations:
+    gke-gcsfuse/volumes: "true"
+spec:
+  nodeSelector:
+    cloud.google.com/gke-tpu-topology: 2x2
+    cloud.google.com/gke-tpu-accelerator: tpu-v6e-slice
+  restartPolicy: Never
+  serviceAccountName: benchmark-ksa  # KSA configured with Workload Identity
+  containers:
+  - name: fio
+    image: us-docker.pkg.dev/YOUR_PROJECT_ID/gcsfuse-benchmarks/fio-read-benchmark-v3.5.6:latest
+    args: 
+    - "--mount-path=/data"
+    - "--iterations=5"
+    - "--project-id=YOUR_PROJECT_ID"
+    - "--bq-dataset-id=YOUR_BQ_DATASET_ID"
+    - "--bq-table-id=fio_read_http1"
+    securityContext:
+      privileged: true
+    volumeMounts:
-    securityContext:
-      privileged: true
-    volumeMounts:
+    volumeMounts:
-    securityContext:
-      privileged: true
-    volumeMounts:
+    volumeMounts:
+    - name: data-vol
+      mountPath: /data
+  volumes:
+  - name: data-vol
+    csi:
+      driver: gcsfuse.csi.storage.gke.io
+      volumeAttributes:
+        bucketName: "YOUR_BUCKET_NAME"
+        mountOptions: "client-protocol=http1"
+```
+
+### Running a gRPC Benchmark
+
+To run a gRPC benchmark (e.g., `read_grpc`), update the `args` to emit to the correct table, and critically, update the `mountOptions`:
+
+```yaml
+# ... metadata and spec ...
+  containers:
+  - name: fio
+    image: us-docker.pkg.dev/YOUR_PROJECT_ID/gcsfuse-benchmarks/fio-read-benchmark-v3.5.6:latest
+    args: 
+    - "--mount-path=/data"
+    - "--iterations=5"
+    - "--project-id=YOUR_PROJECT_ID"
+    - "--bq-dataset-id=YOUR_BQ_DATASET_ID"
+    - "--bq-table-id=fio_read_grpc"
+# ... volumeMounts ...
+  volumes:
+  - name: data-vol
+    csi:
+      driver: gcsfuse.csi.storage.gke.io
+      volumeAttributes:
+        bucketName: "YOUR_BUCKET_NAME"
+        mountOptions: "client-protocol=grpc"
+```
+
+### Executing the Benchmark
+
+Apply the YAML to your cluster to start the benchmark:
+
+```bash
+kubectl apply -f pod.yaml
+```
+
+Monitor the logs to ensure the benchmark finishes and metrics are published to BigQuery:
+
+```bash
+kubectl logs -f fio-bench-read-http1
+```