Python has a great ecosystem for machine learning, especially on relatively small datasets processed on a single machine. We'll use Dask to scale libraries like NumPy, pandas, and scikit-learn to larger datasets and larger problems.
We'll see that problems can be compute- or memory-bound (or both). We'll see strategies for dealing with these, using a cluster to parallelize our computation.
https://docs.dask.org https://examples.dask.org