(click on the Title to view presentation details)
In my talk I'm going to present some case studies in some simple algorithms that I had to implement in Python in the course of my data analysis work. I'll show my initial naive implementations and how a I slowly optimized these reducing both the amount of code necessary and the execution time. I'll share some the things I learned, the habits I had to break and assumptions I needed to discard, in the hopes that these will be instructive and beneficial to others just starting out with Python and data analysis. I'll integrate a look at some of the functionality provided by the NumPy, SciPy and pandas packages and how I use these to simplify and clarify my day to day work.
Machine Learning should be everywhere. Applications today have the opportunity to leverage all the data being collected about users' interactions and behavior. Unfortunately machine learning at scale is mostly absent from production systems. Training models using scikit-learn is useful, but it is difficult to take this code to production. Why is it so painful to deploy models in a scalable way? What are the options and what challenges exist today? After exploring the current options, I will present Dato Predictive Services, which we developed to address these challenges. Dato Predictive Services enables deploying and managing scikit-learn models into an elastic, scalable, fault-tolerant, low-latency cluster of machines, in AWS & YARN. With Dato Predictive Services, in one command, you can take arbitrary Python and deploy it as a REST service.
This will be a hands-on talk, walking through code and with multiple demonstrations. Bring your laptop to follow along!