GoogleCloudPlatform · vipnydav · Jul 28, 2025 · May 22, 2025 · May 29, 2025 · Jun 11, 2025
diff --git a/perf-benchmarking-for-releases/README.md b/perf-benchmarking-for-releases/README.md
@@ -0,0 +1,127 @@
+# GCSFuse Benchmarking Framework
+
+## Overview
+
+This directory provides a framework to:
+
+- Set up and tear down a Google Compute Engine (GCE) VM for benchmarking.
+- Install necessary tools like FIO, GCSFuse, and Google Cloud SDK on the VM.
+- Run FIO benchmarks using predefined job files.
+- Monitor GCSFuse CPU and memory usage during benchmarks.
+- Upload FIO output and monitoring metrics to Google Cloud Storage (GCS) Bucket and BigQuery.
+
+---
+
+## Setup
+
+Before running the benchmarks, ensure you have:
+
+- **Google Cloud SDK (`gcloud`)**:
+  - Authenticated with `gcloud auth login`.
+  - Configured for the correct project:
+
+    ```bash
+    gcloud config set project <PROJECT_ID>
+    ```
+
+- **Permissions**:
+  - Read access to `gs://gcsfuse-release-benchmark-fio-data`.
+  - Read/Write access to a GCS bucket for results (e.g., `gs://gcsfuse-release-benchmarks-results`).
+  - Permissions to create/delete GCE VMs, GCS buckets, and BigQuery datasets/tables within your specified project.
+
+- **Python Dependencies**:\
+  Install the required Python packages:
+
+    ```bash
+    pip install -r perf-benchmarking-for-releases/requirements.txt
+    ```
+
+  The key Python packages include:
+
+  - `google-cloud-bigquery`
+  - `google-cloud-monitoring`
+  - `requests`
+
+---
+
+## Usage
+
+The main script to run the benchmarks is `run-benchmarks.sh`.  
+It should be executed from the `perf-benchmarking-for-releases` directory.
+
+### Syntax
+
+```bash
+bash run-benchmarks.sh <GCSFUSE_VERSION> <PROJECT_ID> <REGION> <MACHINE_TYPE> <IMAGE_FAMILY> <IMAGE_PROJECT>
+```
+
+### Arguments:
+
+- `<GCSFUSE_VERSION>`: A Git tag (e.g., `v1.0.0`), branch name (e.g., `main`), or a commit ID on the GCSFuse master branch.
+- `<PROJECT_ID>`: Your Google Cloud Project ID in which you want the VM and Bucket to be created.
+- `<REGION>`: The GCP region where the VM and GCS buckets will be created (e.g., `us-south1`).
+- `<MACHINE_TYPE>`: The GCE machine type for the benchmark VM (e.g., `n2-standard-96`). This script supports attaching 16 local NVMe SSDs (375GB each) for LSSD-supported machine types.
+   - **Note:** If your machine type supports LSSD but is not included in the `LSSD_SUPPORTED_MACHINES` array within `run-benchmarks.sh` script, you may need to manually add it to ensure LSSDs are attached.
+- `<IMAGE_FAMILY>`: The image family for the VM (e.g., `ubuntu-2204-lts`).
+- `<IMAGE_PROJECT>`: The image project for the VM (e.g., `ubuntu-os-cloud`).
+
+### Example:
+
+```bash
+bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2204-lts ubuntu-os-cloud
+```
+
+---
+
+## Workflow
+
+1. **Unique ID Generation**:  
+   A unique ID is generated based on the timestamp and a random suffix to name the VM and related GCS buckets.
+
+2. **GCS Bucket Creation**:  
+   A GCS bucket `gcsfuse-release-benchmark-data-<UNIQUE_ID>` is created in the specified region to store FIO test data.
+
+3. **FIO Job File Upload**:  
+   All `.fio` job files from the local `fio-job-files/` directory are uploaded to the results bucket.
+
+4. **Data Transfer**:  
+   A Storage Transfer Service job copies read data from `gs://gcsfuse-release-benchmark-fio-data` to the newly created test data bucket.
+
+5. **VM Creation**:
+   - A GCE VM is created with the specified machine type.
+   - Boot disk size: 1000GB.
+
+6. **`starter-script.sh` Execution**:  
+   This script runs on the VM after creation. It:
+   - Installs common dependencies (e.g., git, fio, python3-pip).
+   - Builds GCSFuse from the specified version.
+   - Sets up local SSDs if enabled.
+   - Downloads FIO job files.
+   - Mounts the GCS bucket using the built GCSFuse binary.
+   - Monitors GCSFuse CPU and memory usage during FIO runs.
+   - Executes each FIO job and saves the JSON output.
+   - Uploads the FIO results and monitoring logs to GCS.
+   - Calls `upload_fio_output_to_bigquery.py` to push results to BigQuery.
+
+7. **Cleanup**:  
+   A cleanup function is trapped to run on exit, ensuring the VM and the created GCS test data bucket are deleted.
+
+---
+
+## Output
+
+### BigQuery
+
+FIO benchmark results, including I/O statistics, latencies, and system resource usage (CPU/Memory), are uploaded to a BigQuery table with:
+
+- **Project ID**: `gcs-fuse-test-ml`
+- **Dataset ID**: `gke_test_tool_outputs`
+- **Table ID**: `fio_outputs`
+
+---
+
+### Google Cloud Storage
+
+- **FIO Test Data:** The FIO test data (copied from `gs://gcsfuse-release-benchmark-fio-data`) is uploaded to a newly created bucket dynamically named `gcsfuse-release-benchmark-data-<UNIQUE_ID>`.
+- **Benchmark Results and FIO Job Files:** FIO JSON output files, benchmark logs, and FIO job files, are uploaded to the `gs://gcsfuse-release-benchmarks-results` bucket. The specific path within this bucket will be `gs://gcsfuse-release-benchmarks-results/<GCSFUSE_VERSION>-<UNIQUE_ID>/`.
+- A `success.txt` file is uploaded to GCS upon successful completion of all benchmarks.
diff --git a/...es/fio-job-files/random-read-workload.fio → ...es/fio-job-files/random_read_workload.fio b/...es/fio-job-files/random-read-workload.fio → ...es/fio-job-files/random_read_workload.fio
diff --git a/...io-job-files/sequential-read-workload.fio → ...io-job-files/sequential_read_workload.fio b/...io-job-files/sequential-read-workload.fio → ...io-job-files/sequential_read_workload.fio
diff --git a/...releases/fio-job-files/write-workload.fio → ...releases/fio-job-files/write_workload.fio b/...releases/fio-job-files/write-workload.fio → ...releases/fio-job-files/write_workload.fio
diff --git a/perf-benchmarking-for-releases/requirements.txt b/perf-benchmarking-for-releases/requirements.txt
@@ -0,0 +1,17 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+google-cloud-bigquery
+google-cloud-monitoring
+requests
diff --git a/perf-benchmarking-for-releases/run-benchmarks.sh b/perf-benchmarking-for-releases/run-benchmarks.sh
@@ -1,5 +1,4 @@
 #!/bin/bash
-
 # Copyright 2025 Google LLC
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -23,7 +22,7 @@ if [ "$#" -ne 6 ]; then
     echo "This script should be run from the 'perf-benchmarking-for-releases' directory."
     echo ""
     echo "Example:"
-    echo "  bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2004-lts ubuntu-os-cloud"
+    echo "  bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2204-lts ubuntu-os-cloud"
     exit 1
 fi
 
@@ -48,9 +47,11 @@ TIMESTAMP=$(date +%Y%m%d-%H%M%S)
 RAND_SUFFIX=$(head /dev/urandom | tr -dc a-z0-9 | head -c 8)
 UNIQUE_ID="${TIMESTAMP}-${RAND_SUFFIX}"
 
-VM_NAME="gcsfuse-perf-benchmark-${IMAGE_FAMILY}-${UNIQUE_ID}"
+VM_NAME="gcsfuse-perf-benchmark-${UNIQUE_ID}"
 GCS_BUCKET_WITH_FIO_TEST_DATA="gcsfuse-release-benchmark-data-${UNIQUE_ID}"
 RESULTS_BUCKET_NAME="gcsfuse-release-benchmarks-results"
+RESULT_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}-${UNIQUE_ID}"
+
 
 # For VM creation, we need a zone within the specified region.
 # We will pick the first available zone, typically ending with '-a'.
@@ -60,13 +61,13 @@ echo "Starting GCSFuse performance benchmarking for version: ${GCSFUSE_VERSION}"
 echo "VM Name: ${VM_NAME}"
 echo "Test Data Bucket: gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}"
 echo "Results Bucket: gs://${RESULTS_BUCKET_NAME}"
+echo "Result Path: ${RESULT_PATH}"
 echo "Project ID: ${PROJECT_ID}"
 echo "Region: ${REGION}"
-echo "VM Zone: ${VM_ZONE}" 
+echo "VM Zone: ${VM_ZONE}"
 echo "Machine Type: ${MACHINE_TYPE}"
 
-# Array for LSSD supported machines
-# Add machine types that support local SSDs (NVMe) here
+# Array for LSSD supported machines. If a machine type supports LSSD but is not listed here, please add it manually.
 LSSD_SUPPORTED_MACHINES=("n2-standard-96" "c2-standard-60" "c2d-standard-112" "c3-standard-88" "c3d-standard-180")
 
 # Check if the chosen machine type is directly present in the LSSD_SUPPORTED_MACHINES array
@@ -88,42 +89,27 @@ fi
 
 # Cleanup function to be called on exit
 cleanup() {
-    echo "Initiating cleanup..."
-
     # Delete VM if it exists
     if gcloud compute instances describe "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" >/dev/null 2>&1; then
-        echo "Deleting VM: ${VM_NAME}"
-        gcloud compute instances delete "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --delete-disks=all -q >/dev/null
-    else
-        echo "VM '${VM_NAME}' not found; skipping deletion."
+        gcloud compute instances delete "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --delete-disks=all -q >/dev/null 2>&1
     fi
 
     # Delete GCS bucket with test data if it exists
     if gcloud storage buckets list --project="${PROJECT_ID}" --filter="name:(${GCS_BUCKET_WITH_FIO_TEST_DATA})" --format="value(name)" | grep -q "^${GCS_BUCKET_WITH_FIO_TEST_DATA}$"; then
-        echo "Deleting GCS bucket: ${GCS_BUCKET_WITH_FIO_TEST_DATA}"
-        gcloud storage rm -r "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" -q >/dev/null
-    else
-        echo "Bucket '${GCS_BUCKET_WITH_FIO_TEST_DATA}' not found; skipping deletion."
+        gcloud storage rm -r "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" -q >/dev/null 2>&1
     fi
-
-    echo "Cleanup complete."
 }
 
-
 # Register the cleanup function to run on EXIT signal
 trap cleanup EXIT
 
 # Create the GCS bucket for FIO test data in the specified REGION
 echo "Creating GCS test data bucket: gs://${GCS_BUCKET_WITH_FIO_TEST_DATA} in region: ${REGION}"
 gcloud storage buckets create "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" --project="${PROJECT_ID}" --location="${REGION}"
 
-# Clear the existing GCSFUSE_VERSION directory in the results bucket
-echo "Clearing previous data in gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/..."
-gcloud storage rm -r "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/**" --quiet || true
-
 # Upload FIO job files to the results bucket for the VM to download
-echo "Uploading all .fio job files from local 'fio-job-files/' directory to gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/fio-job-files/..."
-gcloud storage cp fio-job-files/*.fio "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/fio-job-files/"
+echo "Uploading all .fio job files from local 'fio-job-files/' directory to ${RESULT_PATH}/fio_job_files/..."
+gcloud storage cp fio_job_files/*.fio "${RESULT_PATH}/fio-job-files/"
 echo "FIO job files uploaded."
 
 # Get the project number
@@ -133,24 +119,19 @@ PROJECT_NUMBER=$(gcloud projects describe "$PROJECT_ID" --format="value(projectN
 STS_ACCOUNT="project-${PROJECT_NUMBER}@storage-transfer-service.iam.gserviceaccount.com"
 
 # Grant the service account 'roles/storage.admin' permissions on the newly created bucket
-# This allows the service account to manage the bucket and perform transfers
 gcloud storage buckets add-iam-policy-binding "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" \
   --member="serviceAccount:${STS_ACCOUNT}" \
   --role="roles/storage.admin"
 
-# Since file generation with fio is painfully slow, we will use storage transfer
-# job to transfer test data from a fixed GCS bucket to the newly created bucket.
-# Note : We need to copy only read data.
+# Use storage transfer job to copy test data from a fixed GCS bucket.
 echo "Creating storage transfer job to copy read data to gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}..."
-
-TRANSFER_JOB_NAME=$(gcloud transfer jobs create \
+gcloud transfer jobs create \
   gs://gcsfuse-release-benchmark-fio-data \
   gs://${GCS_BUCKET_WITH_FIO_TEST_DATA} \
    --include-prefixes=read \
   --project="${PROJECT_ID}" \
   --format="value(name)" \
-  --no-async) 
-
+  --no-async
 echo "Transfer completed."
 
 
@@ -162,34 +143,32 @@ gcloud compute instances create "${VM_NAME}" \
     --machine-type="${MACHINE_TYPE}" \
     --image-project="${IMAGE_PROJECT}" \
     --zone="${VM_ZONE}" \
-    --boot-disk-size=1000GB \
+    --boot-disk-size=100GB \
     --network-interface=network-tier=PREMIUM,nic-type=GVNIC \
     --scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.read_write \
     --network-performance-configs=total-egress-bandwidth-tier=TIER_1 \
-    --metadata GCSFUSE_VERSION="${GCSFUSE_VERSION}",GCS_BUCKET_WITH_FIO_TEST_DATA="${GCS_BUCKET_WITH_FIO_TEST_DATA}",RESULTS_BUCKET_NAME="${RESULTS_BUCKET_NAME}",LSSD_ENABLED="${LSSD_ENABLED}" \
+    --metadata GCSFUSE_VERSION="${GCSFUSE_VERSION}",GCS_BUCKET_WITH_FIO_TEST_DATA="${GCS_BUCKET_WITH_FIO_TEST_DATA}",RESULT_PATH="${RESULT_PATH}",LSSD_ENABLED="${LSSD_ENABLED}",MACHINE_TYPE="${MACHINE_TYPE}",PROJECT_ID="${PROJECT_ID}",UNIQUE_ID="${UNIQUE_ID}" \
     --metadata-from-file=startup-script=starter-script.sh \
     ${VM_LOCAL_SSD_ARGS}
 echo "VM created. Benchmarks will run on the VM."
 
-echo "Waiting for benchmarks to complete on VM (polling for success.txt)..."
+echo "Waiting for benchmarks to complete on VM..."
 
-SUCCESS_FILE_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/success.txt"
-LOG_FILE_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/benchmark_run.log"
+SUCCESS_FILE_PATH="${RESULT_PATH}/success.txt"
+LOG_FILE_PATH="${RESULT_PATH}/benchmark_run.log"
 SLEEP_TIME=300  # 5 minutes
 sleep "$SLEEP_TIME"
-#max 18 retries amounting to ~1hr30mins time
 MAX_RETRIES=18
 
 for ((i=1; i<=MAX_RETRIES; i++)); do
     if gcloud storage objects describe "${SUCCESS_FILE_PATH}" &> /dev/null; then
         echo "Benchmarks completed. success.txt found."
-        echo "Results are available in gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/"
+        echo "Results are available in BigQuery: gcs-fuse-test-ml.gke_test_tool_outputs.fio_outputs"
         echo "Benchmark log file: $LOG_FILE_PATH"
         exit 0
     fi
 
-    # Check for early failure indicators
-    if gcloud storage objects describe "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/details.txt" &> /dev/null || \
+    if gcloud storage objects describe "${RESULT_PATH}/details.txt" &> /dev/null || \
        gcloud storage objects describe "$LOG_FILE_PATH" &> /dev/null; then
         echo "Benchmark log or details.txt found, but success.txt is missing. Possible error in benchmark execution."
         echo "Check logs at: $LOG_FILE_PATH"
@@ -201,9 +180,6 @@ for ((i=1; i<=MAX_RETRIES; i++)); do
 done
 
 
-# Failure case: success.txt was not found after retries
 echo "Timed out waiting for success.txt after $((MAX_RETRIES * SLEEP_TIME / 60)) minutes. Perhaps there is some error."
 echo "Benchmark log file (for troubleshooting): $LOG_FILE_PATH"
 exit 1
-
-# The trap command will handle the cleanup on script exit.