Scripts and advice for running Pangeo with dask-jobqueue on NCI's Gadi, Pawsey's Zeus and CSIRO's Petrichor
These are the scripts that I (Dougie) use. They may be useful to you.
(Note that NCI now provides access to a JupyterLab environment on Gadi via the Australian Research Environment. This is the recommended approach for running Pangeo workflows on NCI. The scripts and instructions in this repo describe an alternative approach that can be easily deployed across different HPC systems.)
Users will need to be able to log in to their system of interest. To use Gadi and Pawsey, users will need to be able to request resources under a project.
New users to Gadi can sign up here, but they will need to either join an existing project or propose a new project to be able to access NCI resources. Existing users can check their projects here.
New users to Pawsey can apply here.
Ideally, users will have a github account (it's free and easy to set up here), but this is not essential.
-
Log in to your system of choice:
Gadi:
ssh -Y <username>@gadi.nci.org.au
Zeus:ssh -Y <username>@zeus.pawsey.org.au
Petrichor:ssh -Y <username>@petrichor.hpc.csiro.au -
If you don't have conda installed or access to conda (try
which conda), install it:wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod +x Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.shYou'll get prompted for where to install conda. The default is home, which may be quite limited for space. It may therefore be a good idea to instead use a different persistent location, e.g.
/g/dataon Gadi,/groupon Zeus or Bowen storage on Petrichor.Note, to run the scripts in this repo
condawill need to be initialised. When you first install conda you will be given the option to append some lines to your.bashrcthat will initialisecondaand activate a(base)environment every time you log in. I recommend doing this. Otherwise, you'll have to initialisecondamanually before progressing. -
If there's any possibility you might edit the scripts in this repo and want to keep track of your edits using git, create a fork of this repo under your own github account by clicking on the
Forkbutton on the top right of this page (strongly recommended). Doing this will create a replica of this repo under your username athttps://github.com/<your_username>/pangeo_hpc.git. If you don't have a github account and you don't want to create one, go to step 4. -
Clone your fork of this repo to a location of your choice on Gadi, Zeus or Petrichor: go to the desired location and run
git clone https://github.com/<your_username>/pangeo_hpc.git(or if your using ssh keys:
git clone git@github.com:<your_username>/pangeo_hpc.git)If you didn't create a fork, clone this repo directly:
git clone https://github.com/csiro-dcfp/pangeo_hpc.git -
If you don't already have a pangeo-like conda environment (containing
jupyter,xarray,dask...), create one using theenvironment.ymlfile in this repo. This should only take a few minutes with a decent internet connection and and file system that supports lots of small files. If you have permission to install into your conda(base)environment (e.g. if you install conda yourself) it's fastest to do this step with mamba, which works like conda but is written in C++. Otherwise you can use conda to create the environment.If you can install into
(base):conda install mamba -y mamba env create -f environment.ymlOtherwise:
conda env create -f environment.ymlThis will create a new conda environment called
pangeo. If you wish to use a different name, e.g.:conda env create --name <different_name> -f environment.yml -
Activate your new
pangeoenvironment and configure your Jupyter password (note, in a previous version of these instructions, you would have also installed and enabled a number of Jupyter labextensions at this point. This is no longer necessary with JupyterLab version 3):conda activate pangeo jupyter notebook --generate-config jupyter notebook passwordand follow the prompts.
-
At this point, you're ready to submit a job to run your JupyterLab and Python instances. Once this job is running and you've accessed JupyterLab via your web browser (see below) you'll be able to request additional resources as a dask cluster (using
dask-jobqueue). We can submit a job to run our JupyterLab instance using the relevantstart_jupyter_<system>.shscript but it may require a little editing first:- Open the relevant
start_jupyter_<system>.shfile and edit the PBS/SLURM header information (the#PBS/#SLURMlines) to reflect your project (if relevant), required resources, etc. Remember these do not need to represent the total resources you require for the job you have planned because you will be able to request additional resources from within JupyterLab usingdask-jobqueue. For interactive science work, I usually request few resources for a relatively long time, and then do compute-heavy reduction task(s) on shorter-termdask-jobqueueclusters. With this type of workflow, the resources you request instart_jupyter_<system>.shneed only reflect what is needed to handle the reduced data.
You could now go ahead and submit your
start_jupyter_<system>.shscript to the queue. However, for convenience I've also written a simple function for handling the submission ofstart_jupyter_<system>.shand parsing instructions from the output file. This function receives some of the key job specifications as optional inputs so you don't have to edit the header onstart_jupyter_<system>.shevery time you want to change any of these. It also receives the name of your pangeo-like conda environment as an input. You can append this function to your.bashrcby running:./instantiate_pangeo_function.shThe
pangeofunction signature is:Gadi:
pangeo walltime(02:00:00) ncpus(4) mem(16GB) project($PROJECT) pangeo_env_name(pangeo) notebook_directory(~)
Zeus:pangeo time(02:00:00) cpus_per_task(4) mem-per-cpu(4GB) account($PAWSEY_PROJECT) pangeo_env_name(pangeo) notebook_directory(~)
Petrichor:pangeo time(02:00:00) cpus_per_task(8) mem-per-cpu(64GB) pangeo_env_name(pangeo) notebook_directory(~)where the defaults are given in brackets. For example, to run with the default settings, one would simply enter into their terminal:
pangeoTo specify a 2 hour job with 6 cpus, one would enter:
pangeo 02:00:00 6 - Open the relevant
-
Run the
pangeofunction or submitstart_jupyter_<system>.shto the queue. For the former, instructions for setting up port forwarding to view your JupyterLab session and dask dashboard will be printed to your screen. For the latter, you'll have to parse them from thejupyter_instructions.txtfile that will appear in the current directory. In both cases, the instructions will only appear once your jobs leaves the queue which may take a minute or so. -
Follow the instructions to access your JupyterLab session via a web browser.
-
Do your science. As mentioned above, my typical workflow is to use
dask-jobqueueto request and access resources for the "heavy-lifting" in my notebooks (e.g. reducing a large dataset down to a 1D or 2D field to plot). Examples of setting up adask-jobqueuecluster are given in the notebooks directory of this repo.Note that getting
dask-jobqueuerunning on Gadi requires the manipulation of the default jobscripts submitted by dask'sPBSClusterinto a format that Gadi expects. An example of this hack is given innotebooks/run_dask-jobqueue_Gadi.ipynb.
- Activate your pangeo environment and install it as a ipykernel (you can change the
--nameand--display-nameif you like):python -m ipykernel install --user --name pangeo --display-name "Python (pangeo)"
You can add other Python-based conda environments in the same way. This will provide you access to your environments from within JupyterLab and will mean you don't have to restart JupyterLab to effectuate any changes/updates you make to your environments (simply restarting the kernel will do).
-
Create a new conda environment and add some essential packages for working with geoscience data in R. Deactivate your pangeo environment and then:
conda create -n r_env -c r r-essentials r-vars conda activate r_env conda install -c conda-forge r-raster r-matlab r-ncdf4 r-lmtest r-cowplot -
Register the R kernel with Jupyter:
Rscript -e 'IRkernel::installspec()'
Now when you spin up JupyterLab you should be able to select and use your R kernel.