Sunday 15:45–16:30 in Intermediate

Data science for lazy people... genetics will work for you!

Diego Hueltes

Audience level:
Intermediate

Description

Data science is fun... right? Data cleaning, feature selection, feature preprocessing, feature construction, model selection, parameter optimization, model validation... oh wait... are you sure? What about automating 80% of the work using genetic algorithms that can make better choices than you? TPOT is a tool that automatically creates and optimizes machine learning pipelines.

Abstract

We love Data Science, but sometimes we have to do some manual and repetitive work before starting with the interesting and fun parts, but that will change. TPOT is an open source tool built on top of scikit-learn for creating and optimizing machine learning pipelines. It can be considered a data science assistant. The library will automate from feature selection to parameter optimization, it is also able to do preprocessing or construct new features from existing ones. TPOT tests a huge number of pipelines to provide you with the optimal one, this work is done with genetic algorithms. It is easy to use, has a familiar syntax if you have used Pandas or scikit-learn, and it's very powerful. Let genetics work for you!

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now