PyData New York City 2019 - Presentation: Hacking the Data Science Challenge

You’ve passed the first round of interviews and are now given a data science take home challenge. How can you analyze the data, demonstrate your data science abilities, and tick the required checkboxes, all within the allotted time? This tutorial will take you through the process of working through a data science challenge using pre-built functions to automate the boring stuff.

Coding a good data science model for work is not the same as programming a good data science challenge for an interview. This tutorial will cover what you need to know in order to make your challenge stand out from the crowd as well as highlighting common mistakes and pitfalls to avoid. We will be stressing best practices more than specific machine learning techniques. This tutorial assumes you have a working knowledge of Pandas and at least one plotting library.

During the tutorial you will be split up into groups to work through different parts of the data science takehome challenge. The code for the tutorial can be found at https://github.com/MichoelSnow/pydata_nyc_2019

The focus for this tutorial will be following topics:

Working with trick data

Investigating a data set for natural and synthetic errors
How to handle errors in the data

Outlining your Challenge

What time requirements actually mean
How much time you should spend on a challenge
How to infer what you will be graded on
How to properly allocate your time

Exploratory Data Analysis (EDA)

What makes a good plot
Choosing what to plot and how to plot it
Knowing when you are done with EDA

Modeling data

How to choose the appropriate model for the company
How to explain model findings

If time permits we will also be discussing the following general best practices:

How to make sure your code runs for your interviewer
How to document and comment appropriately

Wednesday 3:00 PM–4:30 PM in Winter Garden (5412)

Hacking the Data Science Challenge

Michoel Snow, Hillary Green-Lerman

Description

Abstract

Working with trick data

Outlining your Challenge

Exploratory Data Analysis (EDA)

Modeling data

Subscribe to Receive PyData Updates