Kedro Pipeline Template — GEG ETH Zurich

A standardized, production-ready Kedro pipeline template for the Geothermal Energy and Geofluids (GEG) at ETH Zurich. Runs locally out of the box or on Google Cloud Platform with a single flag.

Why use this template?

Standardized structure: Uniform project layout and Data Catalog across all GEG pipelines.
Version-controlled logic: Python transformations tracked in Git for full reproducibility.
Multi-environment: Switch between local and GCP execution with one flag — no code changes needed.
Data governance: Centralized catalog manages GCS and BigQuery connections. No hardcoded paths or secrets.
Quality built-in: Pre-commit hooks run linting (ruff), type checking (mypy), and secret scanning (detect-secrets) automatically.

Prerequisites

Tool	Purpose	Install
Python ≥ 3.10	Runtime	python.org
uv	Package & environment management	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
Google Cloud CLI	GCP authentication (GCP mode only)	cloud.google.com

Setup

1. Clone and setup environment

git clone <this-repo-url>
cd kedro-pipeline
make setup

This creates a virtual environment, installs all dependencies from uv.lock, and sets up the code quality hooks.

Hooks run automatically on every git commit. To run them manually:

make check

Running the pipeline

Local mode (default — no cloud needed)

Data is read from and written to the local data/ directory.

make run

Place your raw CSV files in:

data/01_raw/experiment_material_fluid_flowrate_01.csv
data/01_raw/experiment_material_fluid_pressure_01.csv

Results are written to data/08_reporting/.

GCP mode

Data is read from Google Cloud Storage and results are written to both GCS and BigQuery.

Step 1 — Authenticate with Google Cloud

gcloud auth application-default login

Step 2 — Configure environment variables

Copy the example file and fill in your GCP details:

cp .env.example .env

Edit .env:

GCS_BUCKET=your-gcs-bucket-name   # GCS bucket (without gs://)
GCP_PROJECT=your-gcp-project-id   # GCP project ID
GBQ_DATASET=your_bigquery_dataset  # BigQuery dataset name
# GBQ_LOCATION=europe-west6        # Optional: BigQuery region (default: europe-west6)

Step 3 — Run pipeline

make run-gcp

The .env variables are automatically loaded by the Makefile.

GCP data paths

Layer	Location
Raw inputs	`gs://<GCS_BUCKET>/kedro-pipeline/data/01_raw/`
Intermediate	`gs://<GCS_BUCKET>/kedro-pipeline/data/02_intermediate/`
Primary	`gs://<GCS_BUCKET>/kedro-pipeline/data/03_primary/`
Reporting (CSV)	`gs://<GCS_BUCKET>/kedro-pipeline/data/08_reporting/permeability.csv`
Reporting (BQ)	`<GCP_PROJECT>.<GBQ_DATASET>.permeability`

Project structure

kedro-pipeline/
├── conf/
│   ├── base/           # Shared config — local filesystem defaults
│   │   ├── catalog.yml
│   │   └── parameters.yml
│   ├── gcp/            # GCP overrides — activated with --env gcp
│   │   └── catalog.yml
│   └── local/          # Your personal overrides (gitignored)
├── data/               # Local data (gitignored — only .gitkeep files committed)
│   ├── 01_raw/
│   ├── 02_intermediate/
│   ├── 03_primary/
│   └── 08_reporting/
├── src/
│   └── kedro_pipeline/
│       └── pipelines/
│           └── data_processing/
│               ├── nodes.py      # Pure transformation functions
│               └── pipeline.py   # Pipeline assembly
├── tests/
├── .env.example        # GCP environment variable template
└── pyproject.toml

Common commands

Task	Command
Project setup	`make setup`
Run pipeline locally	`make run`
Run pipeline on GCP	`make run-gcp`
Run a single node	`uv run kedro run --nodes preprocess_flow_node`
Visualize pipeline	`make run-viz`
Run tests	`make test`
Jupyter notebook	`uv run kedro jupyter notebook`
Lint & format	`make check`

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
conf		conf
data		data
docs/source		docs/source
notebooks		notebooks
src/kedro_pipeline		src/kedro_pipeline
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kedro Pipeline Template — GEG ETH Zurich

Why use this template?

Prerequisites

Setup

1. Clone and setup environment

Running the pipeline

Local mode (default — no cloud needed)

GCP mode

Step 1 — Authenticate with Google Cloud

Step 2 — Configure environment variables

Step 3 — Run pipeline

GCP data paths

Project structure

Common commands

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kedro Pipeline Template — GEG ETH Zurich

Why use this template?

Prerequisites

Setup

1. Clone and setup environment

Running the pipeline

Local mode (default — no cloud needed)

GCP mode

Step 1 — Authenticate with Google Cloud

Step 2 — Configure environment variables

Step 3 — Run pipeline

GCP data paths

Project structure

Common commands

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages