A standardized, production-ready Kedro pipeline template for the Geothermal Energy and Geofluids (GEG) at ETH Zurich. Runs locally out of the box or on Google Cloud Platform with a single flag.
- Standardized structure: Uniform project layout and Data Catalog across all GEG pipelines.
- Version-controlled logic: Python transformations tracked in Git for full reproducibility.
- Multi-environment: Switch between local and GCP execution with one flag — no code changes needed.
- Data governance: Centralized catalog manages GCS and BigQuery connections. No hardcoded paths or secrets.
- Quality built-in: Pre-commit hooks run linting (
ruff), type checking (mypy), and secret scanning (detect-secrets) automatically.
| Tool | Purpose | Install |
|---|---|---|
| Python ≥ 3.10 | Runtime | python.org |
| uv | Package & environment management | curl -LsSf https://astral.sh/uv/install.sh | sh |
| Google Cloud CLI | GCP authentication (GCP mode only) | cloud.google.com |
git clone <this-repo-url>
cd kedro-pipeline
make setupThis creates a virtual environment, installs all dependencies from uv.lock, and sets up the code quality hooks.
Hooks run automatically on every git commit. To run them manually:
make checkData is read from and written to the local data/ directory.
make runPlace your raw CSV files in:
data/01_raw/experiment_material_fluid_flowrate_01.csvdata/01_raw/experiment_material_fluid_pressure_01.csv
Results are written to data/08_reporting/.
Data is read from Google Cloud Storage and results are written to both GCS and BigQuery.
gcloud auth application-default loginCopy the example file and fill in your GCP details:
cp .env.example .envEdit .env:
GCS_BUCKET=your-gcs-bucket-name # GCS bucket (without gs://)
GCP_PROJECT=your-gcp-project-id # GCP project ID
GBQ_DATASET=your_bigquery_dataset # BigQuery dataset name
# GBQ_LOCATION=europe-west6 # Optional: BigQuery region (default: europe-west6)make run-gcpThe .env variables are automatically loaded by the Makefile.
| Layer | Location |
|---|---|
| Raw inputs | gs://<GCS_BUCKET>/kedro-pipeline/data/01_raw/ |
| Intermediate | gs://<GCS_BUCKET>/kedro-pipeline/data/02_intermediate/ |
| Primary | gs://<GCS_BUCKET>/kedro-pipeline/data/03_primary/ |
| Reporting (CSV) | gs://<GCS_BUCKET>/kedro-pipeline/data/08_reporting/permeability.csv |
| Reporting (BQ) | <GCP_PROJECT>.<GBQ_DATASET>.permeability |
kedro-pipeline/
├── conf/
│ ├── base/ # Shared config — local filesystem defaults
│ │ ├── catalog.yml
│ │ └── parameters.yml
│ ├── gcp/ # GCP overrides — activated with --env gcp
│ │ └── catalog.yml
│ └── local/ # Your personal overrides (gitignored)
├── data/ # Local data (gitignored — only .gitkeep files committed)
│ ├── 01_raw/
│ ├── 02_intermediate/
│ ├── 03_primary/
│ └── 08_reporting/
├── src/
│ └── kedro_pipeline/
│ └── pipelines/
│ └── data_processing/
│ ├── nodes.py # Pure transformation functions
│ └── pipeline.py # Pipeline assembly
├── tests/
├── .env.example # GCP environment variable template
└── pyproject.toml
| Task | Command |
|---|---|
| Project setup | make setup |
| Run pipeline locally | make run |
| Run pipeline on GCP | make run-gcp |
| Run a single node | uv run kedro run --nodes preprocess_flow_node |
| Visualize pipeline | make run-viz |
| Run tests | make test |
| Jupyter notebook | uv run kedro jupyter notebook |
| Lint & format | make check |