Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions npi/fio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# GCSFuse FIO Benchmark Runner

## Overview

This Python script automates the process of benchmarking GCSFuse performance using the Flexible I/O Tester (FIO). It handles the entire workflow from setting up dependencies to running the benchmarks and cleaning up resources.

The script performs the following actions:
1. **Dependency Installation**: Installs `git`, `fio`, and `fuse` if they are not present (supports Debian/Ubuntu and RHEL/CentOS based systems).
2. **GCSFuse Setup**: Clones the GCSFuse GitHub repository, checks out a specific version (branch, tag, or commit), and builds the binary.
3. **GCS Bucket Management**: Creates a temporary GCS bucket for the test and deletes it upon completion.
4. **Mounting**: Mounts the GCS bucket using the built GCSFuse binary and specified flags.
5. **Benchmarking**: Runs FIO tests against the mounted directory based on a provided FIO configuration file for a specified number of iterations.
6. **Cleanup**: Unmounts the GCSFuse directory and deletes the GCS bucket, ensuring a clean state.

## Prerequisites

Before running the script, ensure you have the following installed and configured:

- **Python 3.8+**
- **Go (1.21 or newer)**: The script requires Go to build GCSFuse from source.
- **Google Cloud SDK (`gcloud`)**:
- Authenticated: `gcloud auth login`
- Project configured: `gcloud config set project <YOUR_PROJECT_ID>`
- **Sudo privileges**: The script requires `sudo` to install packages and clear system caches.

## Usage

The script is invoked from the command line with several arguments to control the benchmark run.

```bash
python3 run_fio_benchmark.py [OPTIONS]
```

### Arguments

- `--gcsfuse-version`: (Required) The GCSFuse version to test (e.g., `v1.2.0`, `master`, or a commit hash).
- `--project-id`: (Required) Your Google Cloud Project ID.
- `--location`: (Required) The GCP location (region or zone) for the GCS bucket (e.g., `us-central1`).
- `--fio-config`: (Required) Path to the FIO configuration file.
- `--gcsfuse-flags`: (Optional) Flags for GCSFuse, enclosed in quotes (e.g., `"--implicit-dirs --max-conns-per-host 100"`). Default is empty.
- `--iterations`: (Optional) Number of FIO test iterations. Default is `1`.
- `--work-dir`: (Optional) A temporary directory for builds and mounts. Default is `/tmp/gcsfuse_benchmark`.
- `--output-dir`: (Optional) Directory to save FIO JSON output files. Default is `./fio_results`.
- `--skip-deps-install`: (Optional) Skip the automatic dependency installation check.

### Example

1. **Create an FIO config file (`sample.fio`):**

```ini
[global]
ioengine=libaio
direct=1
runtime=30
time_based
group_reporting
filename=testfile

[random-read-4k]
bs=4k
rw=randread
size=1G

[random-write-1m]
bs=1m
rw=randwrite
size=1G
```

2. **Run the benchmark script:**

```bash
python3 run_fio_benchmark.py \
--gcsfuse-version master \
--project-id your-gcp-project-id \
--location us-central1 \
--fio-config ./sample.fio \
--gcsfuse-flags "--implicit-dirs" \
--iterations 3
```

## Output

The script will create an output directory (e.g., `./fio_results/`) containing the FIO results in JSON format, with one file per iteration.

- `fio_results_iter_1.json`
- `fio_results_iter_2.json`
- `fio_results_iter_3.json`

These files can be parsed for detailed performance analysis.
Comment on lines +1 to +90

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation in this README is significantly outdated and does not align with the new containerized approach introduced in this pull request. For example:

  • It describes a standalone script that handles dependency installation and GCSFuse compilation, which is now handled by the Dockerfiles.
  • The command-line arguments listed (e.g., --gcsfuse-version, --project-id, --skip-deps-install) do not match the arguments in the new scripts (run_fio_benchmark.py, run_fio_matrix.py).
  • The setup and execution flow described is for a non-containerized environment.

This discrepancy will cause confusion for users. Please update this README to accurately describe the new container-based workflow, including how to build the Docker images and run the benchmarks using docker run, and document the correct command-line arguments for the new scripts.

9 changes: 9 additions & 0 deletions npi/fio/fio.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
ARG UBUNTU_VERSION=24.04
ARG GO_VERSION=1.24.5
ARG ARCH=amd64
ARG GCSFUSE_VERSION=master

FROM gcr.io/gcs-fuse-test/gcsfuse-${GCSFUSE_VERSION}-perf-base-${ARCH}:latest
COPY run_fio_benchmark.py /run_fio_benchmark.py
COPY fio_benchmark_runner.py /fio_benchmark_runner.py
ENTRYPOINT ["/run_fio_benchmark.py"]
191 changes: 191 additions & 0 deletions npi/fio/fio_benchmark_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
#!/usr/bin/env python3
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Core logic for GCSFuse performance benchmarking with FIO."""

import json
import logging
import os
import shlex
import subprocess
import sys
import time

def run_command(command, check=True, cwd=None, extra_env=None):
"""Runs a command and logs its output."""
logging.info(f"Running command: {' '.join(command)}")

env = os.environ.copy()
if extra_env:
env.update(extra_env)

try:
result = subprocess.run(
command, check=check, capture_output=True, text=True, cwd=cwd, env=env
)
if result.stdout:
logging.info(f"STDOUT: {result.stdout.strip()}")
if result.stderr:
# Use warning for stderr as some tools write info there
logging.info(f"STDERR: {result.stderr.strip()}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on the preceding line suggests using logging.warning for stderr, as some tools may write non-error information to it. However, the implementation uses logging.info. To align with the intention described in the comment and to better classify these messages, please change this to logging.warning.

Suggested change
logging.info(f"STDERR: {result.stderr.strip()}")
logging.warning(f"STDERR: {result.stderr.strip()}")

return result
except subprocess.CalledProcessError as e:
logging.error(f"Command failed with exit code {e.returncode}")
logging.error(f"STDOUT: {e.stdout.strip() if e.stdout else 'N/A'}")
logging.error(f"STDERR: {e.stderr.strip() if e.stderr else 'N/A'}")
raise

def mount_gcsfuse(gcsfuse_bin, flags, bucket_name, mount_point):
"""Mounts the GCS bucket using GCSFuse."""
os.makedirs(mount_point, exist_ok=True)
logging.info(f"Mounting gs://{bucket_name} to {mount_point}")
cmd = [gcsfuse_bin] + shlex.split(flags) + [bucket_name, mount_point]
run_command(cmd)
time.sleep(2) # Give a moment for the mount to register
if not os.path.ismount(mount_point):
logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
sys.exit(1)
Comment on lines +56 to +59

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed time.sleep(2) to wait for the mount to register is not fully reliable. On a heavily loaded system, mounting could take longer, leading to a race condition where the script incorrectly reports a failure. A more robust approach is to poll for the mount status with a timeout.

Suggested change
time.sleep(2) # Give a moment for the mount to register
if not os.path.ismount(mount_point):
logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
sys.exit(1)
# Give up to 10s for the mount to register, polling every second.
for _ in range(10):
if os.path.ismount(mount_point):
break
time.sleep(1)
else:
logging.error("Mounting failed. Check GCSFuse logs (e.g., in /var/log/syslog).")
sys.exit(1)

logging.info("Mount successful.")


def unmount_gcsfuse(mount_point):
"""Unmounts the GCSFuse file system."""
logging.info(f"Unmounting {mount_point}")
try:
run_command(["fusermount", "-u", mount_point])
except (FileNotFoundError, subprocess.CalledProcessError):
logging.warning("`fusermount -u` failed. Retrying with `sudo umount`.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment "Retrying with sudo umount" is misleading, as the subsequent command does not use sudo. Since these scripts are intended to run as root within a Docker container, sudo is not necessary. Please update the comment to accurately reflect the command being executed to avoid confusion.

Suggested change
logging.warning("`fusermount -u` failed. Retrying with `sudo umount`.")
logging.warning("`fusermount -u` failed. Retrying with `umount`.")

time.sleep(2)
run_command(["umount", "-l", mount_point], check=False)


def run_fio_test(fio_config, mount_point, iteration, output_dir, fio_env=None):
"""Runs a single FIO test iteration."""
logging.info(f"Starting FIO test iteration {iteration}...")
output_filename = os.path.join(output_dir, f"fio_results_iter_{iteration}.json")
cmd = [
"fio", fio_config, "--output-format=json", f"--output={output_filename}",
f"--directory={mount_point}"
]
run_command(cmd, extra_env=fio_env)
logging.info(f"FIO test iteration {iteration} complete. Results: {output_filename}")


def parse_fio_output(filename):
"""Parses FIO JSON output to extract key metrics."""
try:
with open(filename, "r") as f:
data = json.load(f)
except (json.JSONDecodeError, FileNotFoundError) as e:
logging.error(f"Could not read or parse FIO output {filename}: {e}")
return []

results = []
for job in data.get("jobs", []):
job_name = job.get("jobname", "unnamed_job")
for op in ["read", "write"]:
if op in job:
stats = job[op]
# Bandwidth is in KiB/s, convert to MiB/s
bw_mibps = stats.get("bw", 0) / 1024.0
if bw == 0:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a NameError here because the variable bw is not defined. You likely intended to use bw_mibps, which was defined on the preceding line. This bug will cause the script to crash when parsing FIO results.

Suggested change
if bw == 0:
if bw_mibps == 0:

continue
iops = stats.get("iops", 0)

# Latency can be under 'lat_ns', 'clat_ns', etc.
lat_stats = stats.get("lat_ns") or {}

# Convert from ns to ms
mean_lat_ms = lat_stats.get("mean", 0) / 1_000_000.0

# Percentiles are in a sub-dict with string keys
percentiles = lat_stats.get("percentiles", {}) # FIO 3.x

p99_key = next((k for k in percentiles if k.startswith("99.00")), None)
p99_lat_ms = (
percentiles.get(p99_key, 0) / 1_000_000.0 if p99_key else 0
)

results.append({
"job_name": job_name,
"operation": op,
"bw_mibps": bw_mibps,
"iops": iops,
"mean_lat_ms": mean_lat_ms,
"p99_lat_ms": p99_lat_ms,
})
return results


def print_summary(all_results):
"""Prints a summary of all FIO iterations."""
if not all_results:
logging.warning("No results to summarize.")
return

logging.info("\n--- FIO Benchmark Summary ---")
header = (f"{'Iter':<5} {'Job Name':<20} {'Op':<6} {'Bandwidth (MiB/s)':<20} "
f"{'IOPS':<12} {'Mean Latency (ms)':<20} {'P99 Latency (ms)':<20}")
print(header)
print("-" * len(header))
for i, iteration_results in enumerate(all_results, 1):
if not iteration_results:
print(f"{i:<5} No results for this iteration.")
continue
for result in iteration_results:
print(f"{i:<5} {result['job_name']:<20} {result['operation']:<6} "
f"{result['bw_mibps']:<20.2f} {result['iops']:<12.2f} "
f"{result['mean_lat_ms']:<20.4f} {result['p99_lat_ms']:<20.4f}")
print("-" * len(header))


def run_benchmark(
gcsfuse_flags, bucket_name, iterations, fio_config, work_dir, output_dir, fio_env=None
):
"""Runs the full FIO benchmark suite."""
os.makedirs(work_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)

gcsfuse_bin = "/gcsfuse/gcsfuse"
mount_point = os.path.join(work_dir, "mount_point")

# Prepare environment for FIO
fio_run_env = {"DIR": mount_point}
if fio_env:
fio_run_env.update(fio_env)

all_results = []

for i in range(1, iterations + 1):
logging.info(f"--- Starting Iteration {i}/{iterations} ---")
output_filename = os.path.join(output_dir,
f"fio_results_iter_{i}.json")
if os.path.exists(output_filename):
os.remove(output_filename)
try:
logging.info("Clearing page cache...")
run_command(["sh", "-c", "echo 3 > /proc/sys/vm/drop_caches"])

mount_gcsfuse(gcsfuse_bin, gcsfuse_flags, bucket_name, mount_point)
run_fio_test(fio_config, mount_point, i, output_dir, fio_env=fio_run_env)

iteration_results = parse_fio_output(output_filename)
all_results.append(iteration_results)
finally:
if os.path.ismount(mount_point):
unmount_gcsfuse(mount_point)
logging.info(f"--- Finished Iteration {i}/{iterations} ---")

print_summary(all_results)
12 changes: 12 additions & 0 deletions npi/fio/read.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
ARG UBUNTU_VERSION=24.04
ARG GO_VERSION=1.24.5
ARG ARCH=amd64
ARG GCSFUSE_VERSION=master

FROM gcr.io/gcs-fuse-test/gcsfuse-${GCSFUSE_VERSION}-perf-base-${ARCH}:latest
COPY run_fio_benchmark.py /run_fio_benchmark.py
COPY fio_benchmark_runner.py /fio_benchmark_runner.py
COPY run_fio_matrix.py /run_fio_matrix.py
COPY read_matrix.csv /read_matrix.csv
COPY read.fio /read.fio
ENTRYPOINT ["/run_fio_matrix.py", "--matrix-config", "/read_matrix.csv", "--fio-template", "/read.fio"]
26 changes: 26 additions & 0 deletions npi/fio/read.fio
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[global]
allrandrepeat=0
create_serialize=0
direct=1
fadvise_hint=0
file_service_type=random
group_reporting=1
iodepth=64
ioengine=libaio
invalidate=1
numjobs=128
openfiles=1
# Change "read" to "randread" to test random reads.
rw=${READ_TYPE}
thread=1
filename_format=read/${FILE_SIZE}/${NR_FILES}/$jobname.$jobnum/$filenum

[experiment]
stonewall
directory=${DIR}
# Update the block size value from the table for different experiments.
bs=${BLOCK_SIZE}
# Update the file size value from table(file size) for different experiments.
filesize=${FILE_SIZE}
# Set nrfiles per thread in such a way that the test runs for 1-2 min.
nrfiles=${NR_FILES}
17 changes: 17 additions & 0 deletions npi/fio/read_matrix.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
READ_TYPE,FILE_SIZE,BLOCK_SIZE,NR_FILES
read,128K,128K,30
read,256K,128K,30
read,1M,1M,30
read,5M,1M,20
read,10M,1M,20
read,50M,1M,20
read,100M,1M,10
read,200M,1M,10
read,1G,1M,10
randread,256K,128K,30
randread,5M,1M,20
randread,10M,1M,20
randread,50M,1M,20
randread,100M,1M,10
randread,200M,1M,10
randread,1G,1M,10
Loading