(click on the Title to view presentation details)
We all learned to program in a particular way, either you started out using Basic, Pascal, C, Fortran anyone. If you're younger maybe Java was your first language or maybe you came to programming through the web using Javascript, PHP, Perl or Ruby. Python allows you to work with multiple programming paradigms (procedural, object-oriented, functional). When you first started using Python you were able to find features that looked familiar, loops, conditionals, classes, ... and worked how you expected them to work. This is great for adoption, but there is a hidden cost, it doesn't force you to change anything. You can go on writing code as you always have and it will work. But there could be a better way of doing things, one that is more efficient and easier to understand and explain. This is especially true in the realm of data processing.
In my talk I'm going to present some case studies in some simple algorithms that I had to implement in Python in the course of my data analysis work. I'll show my initial naive implementations and how a I slowly optimized these reducing both the amount of code necessary and the execution time. I'll share some the things I learned, the habits I had to break and assumptions I needed to discard, in the hopes that these will be instructive and beneficial to others just starting out with Python and data analysis. I'll integrate a look at some of the functionality provided by the NumPy, SciPy and pandas packages and how I use these to simplify and clarify my day to day work.
Machine Learning should be everywhere. Applications today have the opportunity to leverage all the data being collected about users' interactions and behavior. Unfortunately machine learning at scale is mostly absent from production systems. Training models using scikit-learn is useful, but it is difficult to take this code to production. Why is it so painful to deploy models in a scalable way? What are the options and what challenges exist today? After exploring the current options, I will present Dato Predictive Services, which we developed to address these challenges. Dato Predictive Services enables deploying and managing scikit-learn models into an elastic, scalable, fault-tolerant, low-latency cluster of machines, in AWS & YARN. With Dato Predictive Services, in one command, you can take arbitrary Python and deploy it as a REST service.
This will be a hands-on talk, walking through code and with multiple demonstrations. Bring your laptop to follow along!