PyData Carolinas 2016 | Presentation: Just Bring Glue - Leveraging Multiple Libraries To Quickly Build Powerful New Tools

Friday 2:00 PM–2:40 PM in Room 2

Just Bring Glue - Leveraging Multiple Libraries To Quickly Build Powerful New Tools

Rob Agle

Audience level:: Novice

Description

It has never been easier for developers to create simple-yet-powerful data-driven or data-informed tools. Through case studies, we'll explore a few projects that use a number of open source libraries or modules in concert. Next, we'll cover strategies for learning these new tools. Finally, we wrap up with pitfalls to keep in mind when gluing powerful things together quickly.

Abstract

Prerequisites

A basic understanding of programming with python
Helpful if you'd like to follow along with the workflow example, but not required: ipython (shell) and pudb installed

Part 0 - Introduction and Overview

We'll open with an introduction and a quick overview of each section of the talk.

Part 1 - Case Studies: two brief case studies covering tools or projects that were created by composing several open source python libraries
Part 2 - Strategies for Becoming Familiar with New Tools: covering the use of pip, the ipython shell and a few popular debuggers
Part 3 - Potential Pitfalls: Common conceptual issues that arise when creating powerful tools quickly with "glue code"
Part 4 - Closing, Q&A: Giving a list of resources and links, and hopefully learning about some new tools from the audience as well

Part 1 - Case Studies

For the sake of time we'll move fairly quickly through each case, focusing on how libraries used in each project come together at a high level with a few flow charts, diagrams and key "glue code" listings.

Case study 1 - An Intelligent Unit Test Discovery Tool

The first case study will cover a tool the author built that takes a natural language search term and suggests unit tests to run from a large suite of test cases.

Modules / Libraries Covered:

Case study 2 - Mining, Processing and Visualizing Twitter Data

Next, we'll review the tools used (and how they came together) in a popular 7-part tutorial on mining, processing and visualizing Twitter data.

Modules / Libraries Covered

Part 2 - Strategies For Becoming Familiar With a New Python Tool

Following the case studies, we'll segue into a brief overview of strategies the author uses to explore new tools and how they might fit together.

Topics will include the use of pip and how to make the best use of a powerful debugger like pudb that allows interaction with an ipython shell when exploring a library's API.

Part 3 - Potential Pitfalls

We'll round out the talk with some points on the potential pitfalls of quickly gluing things together.

Beyond the technical concerns - it can be easy to imagine we understand a domain or problem better than we do with such powerful tools quickly at hand.

What is a good strategy for discovering unknown unknowns and turning them into known unknowns when tackling a new problem? How can we close those remaining gaps and ensure we're using a tool correctly?

Lastly, how do we decide when to stop tumbling down the rabbit hole? Which are the gaps that don't necessarily need to be closed?

Part 4 - Closing and Q&A

A comprehensive list of categorized links will be given. The talk will close with a Q&A where (hopefully) audience members will contribute a few of their favorite tools not covered by the talk.

Both author-provided and audience-provided links will be compiled into a resource that will be shared via a blog post (similar to this post on resources for Machine Learning self study; compiled from a talk I gave at a BarCamp in Chiang Mai, Thailand).