GoogleCloudPlatform · kislaykishore · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025
diff --git a/npi/fio/README.md b/npi/fio/README.md
@@ -0,0 +1,90 @@
+# GCSFuse FIO Benchmark Runner
+
+## Overview
+
+This Python script automates the process of benchmarking GCSFuse performance using the Flexible I/O Tester (FIO). It handles the entire workflow from setting up dependencies to running the benchmarks and cleaning up resources.
+
+The script performs the following actions:
+1.  **Dependency Installation**: Installs `git`, `fio`, and `fuse` if they are not present (supports Debian/Ubuntu and RHEL/CentOS based systems).
+2.  **GCSFuse Setup**: Clones the GCSFuse GitHub repository, checks out a specific version (branch, tag, or commit), and builds the binary.
+3.  **GCS Bucket Management**: Creates a temporary GCS bucket for the test and deletes it upon completion.
+4.  **Mounting**: Mounts the GCS bucket using the built GCSFuse binary and specified flags.
+5.  **Benchmarking**: Runs FIO tests against the mounted directory based on a provided FIO configuration file for a specified number of iterations.
+6.  **Cleanup**: Unmounts the GCSFuse directory and deletes the GCS bucket, ensuring a clean state.
+
+## Prerequisites
+
+Before running the script, ensure you have the following installed and configured:
+
+-   **Python 3.8+**
+-   **Go (1.21 or newer)**: The script requires Go to build GCSFuse from source.
+-   **Google Cloud SDK (`gcloud`)**:
+    -   Authenticated: `gcloud auth login`
+    -   Project configured: `gcloud config set project <YOUR_PROJECT_ID>`
+-   **Sudo privileges**: The script requires `sudo` to install packages and clear system caches.
+
+## Usage
+
+The script is invoked from the command line with several arguments to control the benchmark run.
+
+```bash
+python3 run_fio_benchmark.py [OPTIONS]
+```
+
+### Arguments
+
+-   `--gcsfuse-version`: (Required) The GCSFuse version to test (e.g., `v1.2.0`, `master`, or a commit hash).
+-   `--project-id`: (Required) Your Google Cloud Project ID.
+-   `--location`: (Required) The GCP location (region or zone) for the GCS bucket (e.g., `us-central1`).
+-   `--fio-config`: (Required) Path to the FIO configuration file.
+-   `--gcsfuse-flags`: (Optional) Flags for GCSFuse, enclosed in quotes (e.g., `"--implicit-dirs --max-conns-per-host 100"`). Default is empty.
+-   `--iterations`: (Optional) Number of FIO test iterations. Default is `1`.
+-   `--work-dir`: (Optional) A temporary directory for builds and mounts. Default is `/tmp/gcsfuse_benchmark`.
+-   `--output-dir`: (Optional) Directory to save FIO JSON output files. Default is `./fio_results`.
+-   `--skip-deps-install`: (Optional) Skip the automatic dependency installation check.
+
+### Example
+
+1.  **Create an FIO config file (`sample.fio`):**
+
+    ```ini
+    [global]
+    ioengine=libaio
+    direct=1
+    runtime=30
+    time_based
+    group_reporting
+    filename=testfile
+
+    [random-read-4k]
+    bs=4k
+    rw=randread
+    size=1G
+
+    [random-write-1m]
+    bs=1m
+    rw=randwrite
+    size=1G
+    ```
+
+2.  **Run the benchmark script:**
+
+    ```bash
+    python3 run_fio_benchmark.py \
+        --gcsfuse-version master \
+        --project-id your-gcp-project-id \
+        --location us-central1 \
+        --fio-config ./sample.fio \
+        --gcsfuse-flags "--implicit-dirs" \
+        --iterations 3
+    ```
+
+## Output
+
+The script will create an output directory (e.g., `./fio_results/`) containing the FIO results in JSON format, with one file per iteration.
+
+-   `fio_results_iter_1.json`
+-   `fio_results_iter_2.json`
+-   `fio_results_iter_3.json`
+
+These files can be parsed for detailed performance analysis.
diff --git a/npi/fio/fio.dockerfile b/npi/fio/fio.dockerfile
@@ -0,0 +1,9 @@
+ARG UBUNTU_VERSION=24.04
+ARG GO_VERSION=1.24.5
+ARG ARCH=amd64
+ARG GCSFUSE_VERSION=master
+
+FROM gcr.io/gcs-fuse-test/gcsfuse-${GCSFUSE_VERSION}-perf-base-${ARCH}:latest
+COPY run_fio_benchmark.py /run_fio_benchmark.py
+COPY fio_benchmark_runner.py /fio_benchmark_runner.py
+ENTRYPOINT ["/run_fio_benchmark.py"]
diff --git a/npi/fio/fio_benchmark_runner.py b/npi/fio/fio_benchmark_runner.py
@@ -0,0 +1,191 @@
+#!/usr/bin/env python3
+# Copyright 2024 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Core logic for GCSFuse performance benchmarking with FIO."""
+
+import json
+import logging
+import os
+import shlex
+import subprocess
+import sys
+import time
+
+def run_command(command, check=True, cwd=None, extra_env=None):
+    """Runs a command and logs its output."""
+    logging.info(f"Running command: {' '.join(command)}")
+
+    env = os.environ.copy()
+    if extra_env:
+        env.update(extra_env)
+
+    try:
+        result = subprocess.run(
+            command, check=check, capture_output=True, text=True, cwd=cwd, env=env
+        )
+        if result.stdout:
+            logging.info(f"STDOUT: {result.stdout.strip()}")
+        if result.stderr:
+            # Use warning for stderr as some tools write info there
+            logging.info(f"STDERR: {result.stderr.strip()}")
-            logging.info(f"STDERR: {result.stderr.strip()}")
+            logging.warning(f"STDERR: {result.stderr.strip()}")
-            logging.info(f"STDERR: {result.stderr.strip()}")
+            logging.warning(f"STDERR: {result.stderr.strip()}")
+        return result
+    except subprocess.CalledProcessError as e:
+        logging.error(f"Command failed with exit code {e.returncode}")
+        logging.error(f"STDOUT: {e.stdout.strip() if e.stdout else 'N/A'}")
+        logging.error(f"STDERR: {e.stderr.strip() if e.stderr else 'N/A'}")
+        raise
+
+def mount_gcsfuse(gcsfuse_bin, flags, bucket_name, mount_point):
+    """Mounts the GCS bucket using GCSFuse."""
+    os.makedirs(mount_point, exist_ok=True)
+    logging.info(f"Mounting gs://{bucket_name} to {mount_point}")
+    cmd = [gcsfuse_bin] + shlex.split(flags) + [bucket_name, mount_point]
+    run_command(cmd)
+    time.sleep(2)  # Give a moment for the mount to register
+    if not os.path.ismount(mount_point):
+        logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
+        sys.exit(1)
-    time.sleep(2)  # Give a moment for the mount to register
-    if not os.path.ismount(mount_point):
-        logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
-        sys.exit(1)
+    # Give up to 10s for the mount to register, polling every second.
+    for _ in range(10):
+        if os.path.ismount(mount_point):
+            break
+        time.sleep(1)
+    else:
+        logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
+        sys.exit(1)
-    time.sleep(2)  # Give a moment for the mount to register
-    if not os.path.ismount(mount_point):
-        logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
-        sys.exit(1)
+    # Give up to 10s for the mount to register, polling every second.
+    for _ in range(10):
+        if os.path.ismount(mount_point):
+            break
+        time.sleep(1)
+    else:
+        logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
+        sys.exit(1)
+    logging.info("Mount successful.")
+
+
+def unmount_gcsfuse(mount_point):
+    """Unmounts the GCSFuse file system."""
+    logging.info(f"Unmounting {mount_point}")
+    try:
+        run_command(["fusermount", "-u", mount_point])
+    except (FileNotFoundError, subprocess.CalledProcessError):
+        logging.warning("`fusermount -u` failed. Retrying with `sudo umount`.")
-        logging.warning("`fusermount -u` failed. Retrying with `sudo umount`.")
+        logging.warning("`fusermount -u` failed. Retrying with `umount`.")
-        logging.warning("`fusermount -u` failed. Retrying with `sudo umount`.")
+        logging.warning("`fusermount -u` failed. Retrying with `umount`.")
+        time.sleep(2)
+        run_command(["umount", "-l", mount_point], check=False)
+
+
+def run_fio_test(fio_config, mount_point, iteration, output_dir, fio_env=None):
+    """Runs a single FIO test iteration."""
+    logging.info(f"Starting FIO test iteration {iteration}...")
+    output_filename = os.path.join(output_dir, f"fio_results_iter_{iteration}.json")
+    cmd = [
+        "fio", fio_config, "--output-format=json", f"--output={output_filename}",
+        f"--directory={mount_point}"
+    ]
+    run_command(cmd, extra_env=fio_env)
+    logging.info(f"FIO test iteration {iteration} complete. Results: {output_filename}")
+
+
+def parse_fio_output(filename):
+    """Parses FIO JSON output to extract key metrics."""
+    try:
+        with open(filename, "r") as f:
+            data = json.load(f)
+    except (json.JSONDecodeError, FileNotFoundError) as e:
+        logging.error(f"Could not read or parse FIO output {filename}: {e}")
+        return []
+
+    results = []
+    for job in data.get("jobs", []):
+        job_name = job.get("jobname", "unnamed_job")
+        for op in ["read", "write"]:
+            if op in job:
+                stats = job[op]
+                # Bandwidth is in KiB/s, convert to MiB/s
+                bw_mibps = stats.get("bw", 0) / 1024.0
+                if bw == 0:
-                if bw == 0:
+                if bw_mibps == 0:
-                if bw == 0:
+                if bw_mibps == 0:
+                    continue
+                iops = stats.get("iops", 0)
+
+                # Latency can be under 'lat_ns', 'clat_ns', etc.
+                lat_stats = stats.get("lat_ns") or {}
+
+                # Convert from ns to ms
+                mean_lat_ms = lat_stats.get("mean", 0) / 1_000_000.0
+
+                # Percentiles are in a sub-dict with string keys
+                percentiles = lat_stats.get("percentiles", {})  # FIO 3.x
+
+                p99_key = next((k for k in percentiles if k.startswith("99.00")), None)
+                p99_lat_ms = (
+                    percentiles.get(p99_key, 0) / 1_000_000.0 if p99_key else 0
+                )
+
+                results.append({
+                    "job_name": job_name,
+                    "operation": op,
+                    "bw_mibps": bw_mibps,
+                    "iops": iops,
+                    "mean_lat_ms": mean_lat_ms,
+                    "p99_lat_ms": p99_lat_ms,
+                })
+    return results
+
+
+def print_summary(all_results):
+    """Prints a summary of all FIO iterations."""
+    if not all_results:
+        logging.warning("No results to summarize.")
+        return
+
+    logging.info("\n--- FIO Benchmark Summary ---")
+    header = (f"{'Iter':<5} {'Job Name':<20} {'Op':<6} {'Bandwidth (MiB/s)':<20} "
+              f"{'IOPS':<12} {'Mean Latency (ms)':<20} {'P99 Latency (ms)':<20}")
+    print(header)
+    print("-" * len(header))
+    for i, iteration_results in enumerate(all_results, 1):
+        if not iteration_results:
+            print(f"{i:<5} No results for this iteration.")
+            continue
+        for result in iteration_results:
+            print(f"{i:<5} {result['job_name']:<20} {result['operation']:<6} "
+                  f"{result['bw_mibps']:<20.2f} {result['iops']:<12.2f} "
+                  f"{result['mean_lat_ms']:<20.4f} {result['p99_lat_ms']:<20.4f}")
+    print("-" * len(header))
+
+
+def run_benchmark(
+    gcsfuse_flags, bucket_name, iterations, fio_config, work_dir, output_dir, fio_env=None
+):
+    """Runs the full FIO benchmark suite."""
+    os.makedirs(work_dir, exist_ok=True)
+    os.makedirs(output_dir, exist_ok=True)
+
+    gcsfuse_bin = "/gcsfuse/gcsfuse"
+    mount_point = os.path.join(work_dir, "mount_point")
+
+    # Prepare environment for FIO
+    fio_run_env = {"DIR": mount_point}
+    if fio_env:
+        fio_run_env.update(fio_env)
+
+    all_results = []
+
+    for i in range(1, iterations + 1):
+        logging.info(f"--- Starting Iteration {i}/{iterations} ---")
+        output_filename = os.path.join(output_dir,
+                                       f"fio_results_iter_{i}.json")
+        if os.path.exists(output_filename):
+            os.remove(output_filename)
+        try:
+            logging.info("Clearing page cache...")
+            run_command(["sh", "-c", "echo 3 > /proc/sys/vm/drop_caches"])
+
+            mount_gcsfuse(gcsfuse_bin, gcsfuse_flags, bucket_name, mount_point)
+            run_fio_test(fio_config, mount_point, i, output_dir, fio_env=fio_run_env)
+
+            iteration_results = parse_fio_output(output_filename)
+            all_results.append(iteration_results)
+        finally:
+            if os.path.ismount(mount_point):
+                unmount_gcsfuse(mount_point)
+        logging.info(f"--- Finished Iteration {i}/{iterations} ---")
+
+    print_summary(all_results)
diff --git a/npi/fio/read.dockerfile b/npi/fio/read.dockerfile
@@ -0,0 +1,12 @@
+ARG UBUNTU_VERSION=24.04
+ARG GO_VERSION=1.24.5
+ARG ARCH=amd64
+ARG GCSFUSE_VERSION=master
+
+FROM gcr.io/gcs-fuse-test/gcsfuse-${GCSFUSE_VERSION}-perf-base-${ARCH}:latest
+COPY run_fio_benchmark.py /run_fio_benchmark.py
+COPY fio_benchmark_runner.py /fio_benchmark_runner.py
+COPY run_fio_matrix.py /run_fio_matrix.py
+COPY read_matrix.csv /read_matrix.csv
+COPY read.fio /read.fio
+ENTRYPOINT ["/run_fio_matrix.py", "--matrix-config", "/read_matrix.csv", "--fio-template", "/read.fio"]
diff --git a/npi/fio/read.fio b/npi/fio/read.fio
@@ -0,0 +1,26 @@
+[global]
+allrandrepeat=0
+create_serialize=0
+direct=1
+fadvise_hint=0
+file_service_type=random
+group_reporting=1
+iodepth=64
+ioengine=libaio
+invalidate=1
+numjobs=128
+openfiles=1
+# Change "read" to "randread" to test random reads.
+rw=${READ_TYPE} 
+thread=1
+filename_format=read/${FILE_SIZE}/${NR_FILES}/$jobname.$jobnum/$filenum
+
+[experiment]
+stonewall
+directory=${DIR}
+# Update the block size value from the table for different experiments.
+bs=${BLOCK_SIZE}
+# Update the file size value from table(file size) for different experiments.
+filesize=${FILE_SIZE}
+# Set nrfiles per thread in such a way that the test runs for 1-2 min.
+nrfiles=${NR_FILES}
diff --git a/npi/fio/read_matrix.csv b/npi/fio/read_matrix.csv
@@ -0,0 +1,17 @@
+READ_TYPE,FILE_SIZE,BLOCK_SIZE,NR_FILES
+read,128K,128K,30
+read,256K,128K,30
+read,1M,1M,30
+read,5M,1M,20
+read,10M,1M,20
+read,50M,1M,20
+read,100M,1M,10
+read,200M,1M,10
+read,1G,1M,10
+randread,256K,128K,30
+randread,5M,1M,20
+randread,10M,1M,20
+randread,50M,1M,20
+randread,100M,1M,10
+randread,200M,1M,10
+randread,1G,1M,10