Sunday 10:15–11:00 in Audimax

Industrial ML - Overview of the technologies available to build scalable machine learning

Alejandro Saucedo

Audience level:
Novice

Description

This talk will provide useful and practical understanding on how to build industry-ready machine learning pipelines in python through distributed, horizontally-scalable architectures. This talk will also go into detail on the motivations for such architectures, the technologies required, best practices, caveats and practical use-cases in industry.

Abstract

Industrial ML - Overview

This talk will provide useful and practical understanding on how to build industry-ready machine learning models in python through distributed, horizontally-scalable architectures. This talk will also go into detail on the motivations for such architectures, the technologies required, best practices, caveats and practical use-cases in industry. We will use a practical implementation of a distributed machine learning pipeline to process predictions of the most popular cryptocurrencies using celery (and rabbitmq) for the distributed processing, and Docker plus Kubernetes to manage the scalable infrastructure in AWS.

Why

Industry-ready Machine Learning systems have to be bullet-proof. Some of the biggest challenges in Machine Learning involve the heavy RAM usage, varied machine learning model library, heavy computation, security, devops complexity, scalability, deployment and many many more. It is important to understand some of the key challenges that most large scale projects, startups and companies bump into when developing and expanding their Machine Learning capabilities, and what are some best practices, reliable frameworks and tips/tricks to address these.

How

There are multiple ways to address the challenges that a fast-growing project, startup or company will face in their journey. Luckily there are several open source technologies to address these. Python of course comes with a massive library of machine learning toolboxes that allow us to benefit from the most brilliant minds contributing to top performing algorithms - scipy, sklearn, tensorflow, numpy, pandas are but a few key tools in your machine learning toolbox. For distributed computing, celery is certainly a great contender, which allows for easy creation of a manager server architecture with RabbitMQ. Docker allows us to containerise our applications to ensure they can be deployed in a consistent environment. Finally Kubernetes allows us to manage our DevOps infrastructure with quite a lot of the hard work managed automatically.

Subscribe to Receive PyData Updates

Subscribe