Post

Dask

How to install and configure Dask on Kubernetes using Helm, including worker resources and Jupyter notebook integration.

Reference: Dask Kubernetes Helm setup

The following Helm values file configures Dask workers with specific CPU/memory limits and installs additional Python packages on both workers and the Jupyter notebook server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
cat extra-config.yaml
worker:
  replicas: 4
  resources:
    limits:
      cpu: 1
      memory: 0.5G
    requests:
      cpu: 1
      memory: 0.5G
  env:
    - name: EXTRA_CONDA_PACKAGES
      value: numba xarray -c conda-forge
    - name: EXTRA_PIP_PACKAGES
      value: sklearn matplotlib s3fs dask-ml --upgrade

# We want to keep the same packages on the worker and jupyter environments
jupyter:
  enabled: true
  serviceType: NodePort
  env:
    - name: EXTRA_CONDA_PACKAGES
      value: numba xarray matplotlib -c conda-forge
    - name: EXTRA_PIP_PACKAGES
      value: dask_kubernetes s3fs dask-ml --upgrade

Install or upgrade the Dask Helm release using the custom values file defined above.

1
2
helm install k3sdask dask/dask -f extra-config.yaml
helm upgrade k3sdask dask/dask -f extra-config.yaml

Once Dask is deployed, you can connect to it from a Jupyter notebook and dynamically scale workers using the dask_kubernetes library.

```python from dask_kubernetes import KubeCluster cluster = KubeCluster.from_yaml(‘pod.yaml’) cluster.scale(1)

This post is licensed under CC BY 4.0 by the author.