Thursday October 28 3:30 PM – Thursday October 28 4:00 PM in Talks II

Enterprise Machine Learning Pipelines with Unstructured Image Data

Jacqueline Nolis, Chase Ginther

Prior knowledge:
Previous knowledge expected
machine learning with Python


Learn to build a machine learning data pipeline that trains a convolutional neural network to classify images. This talk will describe new technologies to aid machine learning practitioners in managing and using unstructured data and discuss applications to real-world business scenarios.


While many machine learning tasks work well with tabular data, data scientists in many industries and sectors need to model other types of data, such as images and videos, to achieve business goals. For example, retail businesses frequently need to serve appropriate, relevant visuals to customers in the online purchasing funnel. Using machine learning to support this can reduce human toil and allow business to scale at a click, but at the same time if the data is poorly managed it can cause compliance, reproducibility, and security problems.

In this talk, attendees will learn how to effectively manage and use unstructured data for machine learning in enterprise settings. The data pipeline will use new Snowflake storage technology connected with the Saturn Cloud platform to train a convolutional neural network to classify images. The machine learning demonstrations will use GPUs and Dask clusters to highlight performance possibilities while also relating to real-world business use cases.

Attendees will leave with practical skills for applying image classification to their business, including using unstructured data storage and cloud based machine learning tools. This session is recommended for attendees with some experience using Python for machine learning, but no experience with the other tools is necessary to get value from the talk.