Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions perf-benchmarking-for-releases/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# GCSFuse Benchmarking Framework

## Overview

This directory provides a framework to:

- Set up and tear down a Google Compute Engine (GCE) VM for benchmarking.
- Install necessary tools like FIO, GCSFuse, and Google Cloud SDK on the VM.
- Run FIO benchmarks using predefined job files.
- Monitor GCSFuse CPU and memory usage during benchmarks.
- Upload FIO output and monitoring metrics to Google Cloud Storage (GCS) Bucket and BigQuery.

---

## Setup

Before running the benchmarks, ensure you have:

- **Google Cloud SDK (`gcloud`)**:
- Authenticated with `gcloud auth login`.
- Configured for the correct project:

```bash
gcloud config set project <PROJECT_ID>
```

- **Permissions**:
- Read access to `gs://gcsfuse-release-benchmark-fio-data`.
- Read/Write access to a GCS bucket for results (e.g., `gs://gcsfuse-release-benchmarks-results`).
Comment thread
vipnydav marked this conversation as resolved.
- Permissions to create/delete GCE VMs, GCS buckets, and BigQuery datasets/tables within your specified project.

- **Python Dependencies**:\
Install the required Python packages:

```bash
pip install -r perf-benchmarking-for-releases/requirements.txt
```

The key Python packages include:

- `google-cloud-bigquery`
- `google-cloud-monitoring`
- `requests`

---

## Usage

The main script to run the benchmarks is `run-benchmarks.sh`.
It should be executed from the `perf-benchmarking-for-releases` directory.

### Syntax

```bash
bash run-benchmarks.sh <GCSFUSE_VERSION> <PROJECT_ID> <REGION> <MACHINE_TYPE> <IMAGE_FAMILY> <IMAGE_PROJECT>
```

### Arguments:

- `<GCSFUSE_VERSION>`: A Git tag (e.g., `v1.0.0`), branch name (e.g., `main`), or a commit ID on the GCSFuse master branch.
- `<PROJECT_ID>`: Your Google Cloud Project ID in which you want the VM and Bucket to be created.
- `<REGION>`: The GCP region where the VM and GCS buckets will be created (e.g., `us-south1`).
- `<MACHINE_TYPE>`: The GCE machine type for the benchmark VM (e.g., `n2-standard-96`). This script supports attaching 16 local NVMe SSDs (375GB each) for LSSD-supported machine types.
- **Note:** If your machine type supports LSSD but is not included in the `LSSD_SUPPORTED_MACHINES` array within `run-benchmarks.sh` script, you may need to manually add it to ensure LSSDs are attached.
- `<IMAGE_FAMILY>`: The image family for the VM (e.g., `ubuntu-2204-lts`).
- `<IMAGE_PROJECT>`: The image project for the VM (e.g., `ubuntu-os-cloud`).

### Example:

```bash
bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2204-lts ubuntu-os-cloud
```

---

## Workflow

1. **Unique ID Generation**:
A unique ID is generated based on the timestamp and a random suffix to name the VM and related GCS buckets.

2. **GCS Bucket Creation**:
A GCS bucket `gcsfuse-release-benchmark-data-<UNIQUE_ID>` is created in the specified region to store FIO test data.

3. **FIO Job File Upload**:
All `.fio` job files from the local `fio-job-files/` directory are uploaded to the results bucket.

4. **Data Transfer**:
A Storage Transfer Service job copies read data from `gs://gcsfuse-release-benchmark-fio-data` to the newly created test data bucket.

5. **VM Creation**:
- A GCE VM is created with the specified machine type.
- Boot disk size: 1000GB.

6. **`starter-script.sh` Execution**:
This script runs on the VM after creation. It:
- Installs common dependencies (e.g., git, fio, python3-pip).
- Builds GCSFuse from the specified version.
- Sets up local SSDs if enabled.
- Downloads FIO job files.
- Mounts the GCS bucket using the built GCSFuse binary.
- Monitors GCSFuse CPU and memory usage during FIO runs.
- Executes each FIO job and saves the JSON output.
- Uploads the FIO results and monitoring logs to GCS.
- Calls `upload_fio_output_to_bigquery.py` to push results to BigQuery.

7. **Cleanup**:
A cleanup function is trapped to run on exit, ensuring the VM and the created GCS test data bucket are deleted.

---

## Output

### BigQuery

FIO benchmark results, including I/O statistics, latencies, and system resource usage (CPU/Memory), are uploaded to a BigQuery table with:

- **Project ID**: `gcs-fuse-test-ml`
- **Dataset ID**: `gke_test_tool_outputs`
- **Table ID**: `fio_outputs`

---

### Google Cloud Storage

- **FIO Test Data:** The FIO test data (copied from `gs://gcsfuse-release-benchmark-fio-data`) is uploaded to a newly created bucket dynamically named `gcsfuse-release-benchmark-data-<UNIQUE_ID>`.
- **Benchmark Results and FIO Job Files:** FIO JSON output files, benchmark logs, and FIO job files, are uploaded to the `gs://gcsfuse-release-benchmarks-results` bucket. The specific path within this bucket will be `gs://gcsfuse-release-benchmarks-results/<GCSFUSE_VERSION>-<UNIQUE_ID>/`.
- A `success.txt` file is uploaded to GCS upon successful completion of all benchmarks.
17 changes: 17 additions & 0 deletions perf-benchmarking-for-releases/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

google-cloud-bigquery
google-cloud-monitoring
requests
66 changes: 21 additions & 45 deletions perf-benchmarking-for-releases/run-benchmarks.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#!/bin/bash

# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand All @@ -23,7 +22,7 @@ if [ "$#" -ne 6 ]; then
echo "This script should be run from the 'perf-benchmarking-for-releases' directory."
echo ""
echo "Example:"
echo " bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2004-lts ubuntu-os-cloud"
echo " bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2204-lts ubuntu-os-cloud"
Comment thread
vipnydav marked this conversation as resolved.
Outdated
exit 1
fi

Expand All @@ -48,9 +47,11 @@ TIMESTAMP=$(date +%Y%m%d-%H%M%S)
RAND_SUFFIX=$(head /dev/urandom | tr -dc a-z0-9 | head -c 8)
UNIQUE_ID="${TIMESTAMP}-${RAND_SUFFIX}"

VM_NAME="gcsfuse-perf-benchmark-${IMAGE_FAMILY}-${UNIQUE_ID}"
VM_NAME="gcsfuse-perf-benchmark-${UNIQUE_ID}"
GCS_BUCKET_WITH_FIO_TEST_DATA="gcsfuse-release-benchmark-data-${UNIQUE_ID}"
RESULTS_BUCKET_NAME="gcsfuse-release-benchmarks-results"
RESULT_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}-${UNIQUE_ID}"


# For VM creation, we need a zone within the specified region.
# We will pick the first available zone, typically ending with '-a'.
Expand All @@ -60,13 +61,13 @@ echo "Starting GCSFuse performance benchmarking for version: ${GCSFUSE_VERSION}"
echo "VM Name: ${VM_NAME}"
echo "Test Data Bucket: gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}"
echo "Results Bucket: gs://${RESULTS_BUCKET_NAME}"
echo "Result Path: ${RESULT_PATH}"
echo "Project ID: ${PROJECT_ID}"
echo "Region: ${REGION}"
echo "VM Zone: ${VM_ZONE}"
echo "VM Zone: ${VM_ZONE}"
echo "Machine Type: ${MACHINE_TYPE}"

# Array for LSSD supported machines
# Add machine types that support local SSDs (NVMe) here
# Array for LSSD supported machines. If a machine type supports LSSD but is not listed here, please add it manually.
LSSD_SUPPORTED_MACHINES=("n2-standard-96" "c2-standard-60" "c2d-standard-112" "c3-standard-88" "c3d-standard-180")

# Check if the chosen machine type is directly present in the LSSD_SUPPORTED_MACHINES array
Expand All @@ -88,42 +89,27 @@ fi

# Cleanup function to be called on exit
cleanup() {
echo "Initiating cleanup..."

# Delete VM if it exists
if gcloud compute instances describe "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" >/dev/null 2>&1; then
echo "Deleting VM: ${VM_NAME}"
gcloud compute instances delete "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --delete-disks=all -q >/dev/null
else
echo "VM '${VM_NAME}' not found; skipping deletion."
gcloud compute instances delete "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --delete-disks=all -q >/dev/null 2>&1
fi

# Delete GCS bucket with test data if it exists
if gcloud storage buckets list --project="${PROJECT_ID}" --filter="name:(${GCS_BUCKET_WITH_FIO_TEST_DATA})" --format="value(name)" | grep -q "^${GCS_BUCKET_WITH_FIO_TEST_DATA}$"; then
echo "Deleting GCS bucket: ${GCS_BUCKET_WITH_FIO_TEST_DATA}"
gcloud storage rm -r "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" -q >/dev/null
else
echo "Bucket '${GCS_BUCKET_WITH_FIO_TEST_DATA}' not found; skipping deletion."
gcloud storage rm -r "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" -q >/dev/null 2>&1
fi

echo "Cleanup complete."
}


# Register the cleanup function to run on EXIT signal
trap cleanup EXIT

# Create the GCS bucket for FIO test data in the specified REGION
echo "Creating GCS test data bucket: gs://${GCS_BUCKET_WITH_FIO_TEST_DATA} in region: ${REGION}"
gcloud storage buckets create "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" --project="${PROJECT_ID}" --location="${REGION}"

# Clear the existing GCSFUSE_VERSION directory in the results bucket
Comment thread
vipnydav marked this conversation as resolved.
echo "Clearing previous data in gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/..."
gcloud storage rm -r "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/**" --quiet || true

# Upload FIO job files to the results bucket for the VM to download
echo "Uploading all .fio job files from local 'fio-job-files/' directory to gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/fio-job-files/..."
gcloud storage cp fio-job-files/*.fio "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/fio-job-files/"
echo "Uploading all .fio job files from local 'fio-job-files/' directory to ${RESULT_PATH}/fio_job_files/..."
gcloud storage cp fio_job_files/*.fio "${RESULT_PATH}/fio-job-files/"
echo "FIO job files uploaded."

# Get the project number
Expand All @@ -133,24 +119,19 @@ PROJECT_NUMBER=$(gcloud projects describe "$PROJECT_ID" --format="value(projectN
STS_ACCOUNT="project-${PROJECT_NUMBER}@storage-transfer-service.iam.gserviceaccount.com"

# Grant the service account 'roles/storage.admin' permissions on the newly created bucket
# This allows the service account to manage the bucket and perform transfers
gcloud storage buckets add-iam-policy-binding "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" \
--member="serviceAccount:${STS_ACCOUNT}" \
--role="roles/storage.admin"

# Since file generation with fio is painfully slow, we will use storage transfer
# job to transfer test data from a fixed GCS bucket to the newly created bucket.
# Note : We need to copy only read data.
# Use storage transfer job to copy test data from a fixed GCS bucket.
echo "Creating storage transfer job to copy read data to gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}..."

TRANSFER_JOB_NAME=$(gcloud transfer jobs create \
gcloud transfer jobs create \
gs://gcsfuse-release-benchmark-fio-data \
gs://${GCS_BUCKET_WITH_FIO_TEST_DATA} \
--include-prefixes=read \
--project="${PROJECT_ID}" \
--format="value(name)" \
--no-async)

--no-async
echo "Transfer completed."


Expand All @@ -162,34 +143,32 @@ gcloud compute instances create "${VM_NAME}" \
--machine-type="${MACHINE_TYPE}" \
--image-project="${IMAGE_PROJECT}" \
--zone="${VM_ZONE}" \
--boot-disk-size=1000GB \
--boot-disk-size=100GB \
--network-interface=network-tier=PREMIUM,nic-type=GVNIC \
--scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.read_write \
--network-performance-configs=total-egress-bandwidth-tier=TIER_1 \
--metadata GCSFUSE_VERSION="${GCSFUSE_VERSION}",GCS_BUCKET_WITH_FIO_TEST_DATA="${GCS_BUCKET_WITH_FIO_TEST_DATA}",RESULTS_BUCKET_NAME="${RESULTS_BUCKET_NAME}",LSSD_ENABLED="${LSSD_ENABLED}" \
--metadata GCSFUSE_VERSION="${GCSFUSE_VERSION}",GCS_BUCKET_WITH_FIO_TEST_DATA="${GCS_BUCKET_WITH_FIO_TEST_DATA}",RESULT_PATH="${RESULT_PATH}",LSSD_ENABLED="${LSSD_ENABLED}",MACHINE_TYPE="${MACHINE_TYPE}",PROJECT_ID="${PROJECT_ID}",UNIQUE_ID="${UNIQUE_ID}" \
--metadata-from-file=startup-script=starter-script.sh \
${VM_LOCAL_SSD_ARGS}
echo "VM created. Benchmarks will run on the VM."

echo "Waiting for benchmarks to complete on VM (polling for success.txt)..."
echo "Waiting for benchmarks to complete on VM..."

SUCCESS_FILE_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/success.txt"
LOG_FILE_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/benchmark_run.log"
SUCCESS_FILE_PATH="${RESULT_PATH}/success.txt"
LOG_FILE_PATH="${RESULT_PATH}/benchmark_run.log"
SLEEP_TIME=300 # 5 minutes
sleep "$SLEEP_TIME"
#max 18 retries amounting to ~1hr30mins time
MAX_RETRIES=18

for ((i=1; i<=MAX_RETRIES; i++)); do
if gcloud storage objects describe "${SUCCESS_FILE_PATH}" &> /dev/null; then
echo "Benchmarks completed. success.txt found."
echo "Results are available in gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/"
echo "Results are available in BigQuery: gcs-fuse-test-ml.gke_test_tool_outputs.fio_outputs"
Comment thread
vipnydav marked this conversation as resolved.
Outdated
echo "Benchmark log file: $LOG_FILE_PATH"
exit 0
fi

# Check for early failure indicators
if gcloud storage objects describe "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/details.txt" &> /dev/null || \
if gcloud storage objects describe "${RESULT_PATH}/details.txt" &> /dev/null || \
gcloud storage objects describe "$LOG_FILE_PATH" &> /dev/null; then
echo "Benchmark log or details.txt found, but success.txt is missing. Possible error in benchmark execution."
echo "Check logs at: $LOG_FILE_PATH"
Expand All @@ -201,9 +180,6 @@ for ((i=1; i<=MAX_RETRIES; i++)); do
done


# Failure case: success.txt was not found after retries
Comment thread
vipnydav marked this conversation as resolved.
echo "Timed out waiting for success.txt after $((MAX_RETRIES * SLEEP_TIME / 60)) minutes. Perhaps there is some error."
echo "Benchmark log file (for troubleshooting): $LOG_FILE_PATH"
exit 1

# The trap command will handle the cleanup on script exit.
Loading