This repository serves as the provider of an AWS CDK Application which deploys the necessary infrastructure to provide a pangeo-forge Bakery on AWS
- 🧑💻 Development - Requirements
- 🧑💻 Development - Getting Started
- 🧑💻 Development - Makefile goodness
- 🚀 Deployment - Prerequisites
- 🚀 Deployment - Deploying
- 🚀 Deployment - Destroying
- 📊 Flows - Registering the test Recipe
To develop on this project, you should have the following installed:
- Node 14 (We recommend using NVM Node Version Manager)
- AWS CDK - There is a
package.jsonin the repository, it's recommended to runnpm installin the repository root and make use ofnpx <command>rather than globally installing AWS CDK - Python 3.8.10 (We recommend using Pyenv to handle Python versions)
- Poetry
- AWS CLI
- Docker
If you're developing on MacOS, all of the above (apart from AWS CDK) can be installed using homebrew
If you're developing on Windows, we'd recommend using either Git BASH or Windows Subsystem for Linux
NOTE: All make commands should be run from the root of the repository
This project requires some Python and Node dependencies (Including cdk, prefect, and python-dotenv), these are so that:
- We can deploy the Bakery AWS infrastructure
- We can register flows for testing
- We can use
.envfiles to provide both Prefect Flows and CDK with environment variables
To install the dependencies, run:
$ make install # Runs `npm install` to install CDK and `poetry install` to install all the Python dependencies requiredA file named .env is expected in the root of the repository to store variables used within deployment, the expected values are:
# SET BY YOU MANUALLY:
OWNER="<your-name>"
IDENTIFIER="<a-unique-value-to-tie-to-your-deployment>"
AWS_DEFAULT_REGION="<your-preferred-aws-region>"
AWS_DEFAULT_PROFILE="<your-preferred-named-aws-cli-profile>"
RUNNER_TOKEN_SECRET_ARN="<arn-of-your-runner-token-secret>" # See [Deployment - Prerequisites > Prerequisites > cloud.prefect.io Runner Token]
PREFECT__CLOUD__AUTH_TOKEN="<value-of-tenant-token>" # See https://docs.prefect.io/orchestration/concepts/tokens.html#tenant - This is used to support flow registration
PREFECT_PROJECT="<name-of-a-prefect-project>" # See https://docs.prefect.io/orchestration/concepts/projects.html#creating-a-project - This is where the bakery's test flows will be registered
PREFECT__CLOUD__AGENT__LABELS="<a-set-of-prefect-agent-labels>" # See https://docs.prefect.io/orchestration/agents/overview.html#labels - These will be registered with the deployed agent to limit which flows should be executed by the agent
BUCKET_USER_ARN="<arn-of-your-bucket-iam-user>" # See [Deployment > Prerequisites > Bucket IAM User]
BAKERY_IMAGE="<pangeo-forge-bakery-images-image-you-wish-to-use>" # See [Deployment > Prerequisites > Bakery Image]An example called example.env is available for you to copy, rename, and fill out accordingly.
A Makefile is available in the root of the repository to abstract away commonly used commands for development:
make install
This will run
npm installandpipenv installon the repo root, installing the dependencies needed for development of this project
make lint
This will perform a dry run of
flake8,isort, andblackand let you know what issues were found
make format
This will peform a run of
isortandblack, this will modify files if issues were found
make diff
This will run a
cdk diffusing the contents of your.envfile
make deploy
This will run a
cdk deployusing the contents of your.envfile. The deployment is auto-approved, so make sure you know what you're changing with your deployment first! (Best to runmake diffto check!)
make destroy
This will run a
cdk destroyusing the contents of your.envfile. The destroy is auto-approved, so make sure you know what you're destroying first!
make register-flow
This uses the bakery image defined in
BAKERY_IMAGEto register your Flow with Prefect. It takes a parameterflowwhich is the Python file withinflow_test/you'd like to use. You would use it like:$ make register-flow flow=oisst_recipe.py
Firstly, ensure you've installed all the project requirements as described here and here.
To successfully communicate with Prefect Cloud, the ECS Agent we deploy needs access to a RUNNER token outlined here.
You should create a Secret in AWS Secrets Manager (in your deployment region) in the form:
{
"RUNNER_TOKEN": "<The value of the token>"
}
Take a note of the ARN for the token and put it in your .env file under the key of RUNNER_TOKEN_SECRET_ARN.
To be able to utilise S3 Flow Storage, a IAM User must be created in the AWS Account the Bakery is being deployed into.
This user needs no permissions applied to them, these are applied on Bakery deployment.
You can follow the instructions here to create the IAM User, once this is done, place the value of the IAM Users ARN into .env under BUCKET_USER_ARN.
This value is provided to bakeries.yaml so that Flows may be registered to your Bakery.
To be able to register and run Recipes as Prefect Flows, your Bakery must be running one of the pangeo-forge-bakery-images images in both your Prefect Agent and your Flow & Dask tasks.
You can find more information on the pangeo-forge-bakery-images here. Once you've selected which tag you wish to support, you need to add an entry into .env under the name BAKERY_IMAGE. See below for an example:
BAKERY_IMAGE="pangeo/pangeo-forge-bakery-images:pangeonotebook-2021.05.15_prefect-0.14.19_pangeoforgerecipes-0.3.4"You can check what you'll be deploying by running:
$ make diff # Outputs the result of `cdk diff`To deploy the AWS infrastructure required to host your Bakery, you can run:
$ make deploy # Deploys Bakery AWS infrastructureTo destroy the Bakery infrastructure within AWS, you can run:
$ make destroy # Destroys the Bakery infrastructureFor quick testing of your Bakery deployment, there is a Recipe setup as a Flow within flow_test/ that you can register and run. Before you register the example Flow, you must have the values of PREFECT__CLOUD__AUTH_TOKEN, PREFECT_PROJECT, PREFECT__CLOUD__AGENT__LABELS, BAKERY_IMAGE, IDENTIFIER, AWS_DEFAULT_PROFILE, and AWS_DEFAULT_REGION present and populated in .env. You must also have run make install.
When your .env is populated and you've installed the project dependencies, you can register the Flow by running:
$ make register-flow flow=<name-of-flow-file-in-flow_test/>.py
[2021-06-11 12:30:03+0100] INFO - prefect.S3 | Uploading test-noaa-flow/2021-06-11t11-30-03-443149-00-00 to <storage-bucket>
Flow URL: https://cloud.prefect.io/<your-account>/flow/1429ce74-1be7-412f-bc03-2553d79d7752
└── ID: c8de9a87-a534-4b86-a5cc-b02dc61e58bc
└── Project: <PREFECT_PROJECT>
└── Labels: <PREFECT__CLOUD__AGENT__LABELS>You can then navigate to cloud.prefect.io, find your Flow, and run it.