Note
The usage of dask and distributed and the task to implement dvc experiments made this project very convoluted.
It will no longer be maintained: checkout https://github.com/zincware/paraffin for a simpler version instead.
DVC provides tools for building and executing the computational graph
locally through various methods. The dask4dvc package combines
Dask Distributed with DVC to make it easier to
use with HPC managers like Slurm.
The dask4dvc repro package will run the DVC graph in parallel where possible.
Currently, dask4dvc run will not run stages per experiment sequentially.
⚠️ This is an experimental package not affiliated in any way with iterative or DVC.
Dask4DVC provides a CLI similar to DVC.
dvc reprobecomesdask4dvc repro.dvc queue startbecomesdask4dvc run
You can follow the progress using dask4dvc <cmd> --dashboard.
You can use dask4dvc easily with a slurm cluster. This requires a running dask
scheduler:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(
cores=1, memory='128GB',
queue="gpu",
processes=1,
walltime='8:00:00',
job_cpu=1,
job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
scheduler_options={"port": 31415}
)
cluster.adapt()with this setup you can then run dask4dvc repro --address 127.0.0.1:31415 on
the example port 31415.
You can also use config files with dask4dvc repro --config myconfig.yaml. All
dask.distributed Clusters should be supported.
default:
SGECluster:
queue: regular
cores: 10
memory: 16 GB