The PyData ecosystem is growing rapidly, with existing tools maturing and exciting new tools appearing on a regular basis. This talk will examine the crowded PyData ecosystem and bring some clarity to which Python data tool is the right one to reach for on any given analysis. It will focus on use-cases for pure python, toolz, Numpy, Pandas, Blaze, xray, bcolz, Castra, Dask, and Spark.
The PyData ecosystem can be a bit confusing for those new to Python, or even experienced programmers moving to Python for its excellent data analysis capabilities.
We often get confused by the PyData Ecosystem. This will present a detailed look with examples of some of the cool tools out there.
It will touch on pure python, toolz, Numpy, Pandas, Blaze, xray, bcolz, Dask, and Spark, with a focus on the use-cases for each one.
What do you do when your data doesn't fit in-memory, when do you need to use a functional programming approach - when do you need a compression? Where does Dask fit into all of this? When do you need Spark?
And discuss the differences in how data is stored and where you'd use different tools. Peadar will also provide a map of the landscape inspired by the famous Machine Learning flow chart from Andreas Mueller.