Tuesday 1:10 p.m.–1:45 p.m.

Ryd.io - An exploration of K-Means clustering of NYC Taxis in Manhattan

Gregory Kamradt

Audience level:
Intermediate

Description

Every block in New York City has a story to tell.

In March, 2014 Chris Whong submitted a FOIA (Freedom of Information Act) request to the NYC Taxi and Limousine Commission. What he got back was a record of over 173,000,000 taxi rides in 2013.

Ryd.io aims to quantify this story

Simply, what is happening in a city at a certain location and point in time?

Abstract

Every block in New York City has a story to tell.

In March, 2014 Chris Whong submitted a FOIA (Freedom of Information Act) request to the NYC Taxi and Limousine Commission. What he got back was a record of over 173,000,000 taxi rides in 2013.

Ryd.io aims to quantify this story

Ryd.io started as a means to describe social activity through measuring a populations movement throughout the day. Simply, what is happening in a city at a certain location and point in time?

Using previous taxi data, we are able to predict where and how many rides will get dropped off at any given location. Ryd.io takes advantage of SARMIAX time-series analysis and clustering algorithms to present an alternative view of Manhattan.

Ryd.io

Gregory Kamradt, November 2015

Overview

View the app live at Ryd.io

The idea of this project is to allow a user, or marketing lead, to explore and learn about a city from the frequency and volumne of taxi drop off points to focus marketing efforts. My goal was to provide an alternative view of a city that has been clustered into subgroups outlining different weekly and daily ride distributions.

Example

The model is built to predict where and when and how many rides will get dropped off given any location in NYC. Every single block in NYC has as a story to tell. Finding a resource that can tell this story in a meaningful manner is a difficult task. A marketing team foreign to NYC will have little idea about how to navigate this concrete jungle.

Presentation Overview

In this presentation will will go over the steps needed to complete the Ryd.io analysis from start to finish. Techniques will range from data wrangling, clustering, and time-series analysis.

Steps to include:

  1. Data retrieval - Google BigQuery
  2. Data Crunching - Normalization techniques
  3. Clustering - K-Means "But how did you pick your K?" - Gap statistic, Silhouette statistic
  4. Visualization of Clusters - Matplotlib for MVP, move to CartoDB for final product
  5. Analysis of Clusters - How to define the statistical personalities of each cluster
  6. Predicting Rides - How to use the information that we have to predict for a given spot
  7. Final Product - Demo WebApp

Sponsors


Become a sponsor.