John Tukey’s stem-and-leaf plot first appeared in 1970. Although very useful back then, it cannot handle more than 300 data points and is completely text-based. Stemgraphic is a very easy to use python package providing a solution to these limitations. In this talk you will learn to use stemgraphic, see it in action with very large datasets and understand what drove the design decisions.
The stem-and-leaf plot is one of the most powerful tools not found in a data scientist or statistician’s toolbox. If we go back in time thirty some years we find the exact opposite. What happened to the stem-and-leaf plot? Finding the answer led me to design and implement an improved graphical version of the stem-and-leaf plot, as a python package. As a companion to the talk, a printed research paper will be provided to the audience.
What were the design challenges? What about other options for visualizing distributions? These questions will be answered in this part of the talk.
From the standalone command line tool, to the python package, there are many ways to take advantage of stemgraphic. This section of the talk will be all about live demos, at the command line and in a Jupyter notebook.
In this section we will talk about specific design decisions (such as fonts, colors, default options, number of data points, scaling) and about a few implementation details.
Probably the most important question when it comes to usability of a tool designed for exploratory data analysis, data mining etc. How fast is it? This section will include some more demonstrations.
Stem_graphic is an open source python module that implements a fast and easy to use, highly configurable graphical stem-and-leaf plot. The best feature? Without any data wrangling or having to think about options, it provides a visually appealing, usable plot. But there is always room for improvement. We'll wrap up the talk mentioning some of the enhancements on the horizon.