PyData 2013 | Santa Clara, CA

Speaker Bios

(click on the speaker name or photo to view speaker details)

Angelica Pando

Measuring the digital economy using big data

Engineer, Optimization and Analytics, AppNexus

Anthony Scopatz is a computational nuclear engineer / physicist post-doctoral scholar at the FLASH Center at the University of Chicago. His initial workshop teaching experience came from instructing bootcamps for The Hacker Within - a peer-led teaching organization at the University of Wisconsin. Out of this grew a collaboration teaching Software Carpentry bootcamps in partnership with Greg Wilson. During his tenure at Enthought, Inc, Anthony taught many week long courses (approx. 1 per month) on scientific computing in Python.

Bryan Van De Ven

Continuum Analytics

Beautiful Interactive Visualizations in the Browser with Bokeh, Bokeh, Bokeh, Bokeh - Interactive Visualization for Large Datasets, Bokeh Tutorial, Interactive Plots Using Bokeh, The IPython protocol, frontends and kernels

Mr. Van de Ven received undergraduate degrees in Computer Science and Mathematics from UT Austin, and a Master's degree in physics from UCLA. He has worked at the Applied Research Labs, developing software for sonar feature detection and classification systems on US Naval submarine platforms. He also spent time at Enthought, where he worked on problems in financial risk modeling and fluid mixing simulation, and also contributed to the Chaco visualization library. He has also worked on an assortment of iOS projects as an independent consultant.

Chang She

DataPad

Up and Down the Python Data and Web Visualization Stack

Chang is a former quant researcher-trader turned developer of data science platforms and tools. Currently a co-founder at Lambda Foundry providing data science solutions with a fi nancial bent, he is also a core developer of the open source pandas library for data analysis. His previous employers include AQR Capital Management and Barclays Capital. Chang graduated from MIT with degrees in computer science and political science.

Charles Doutriaux

Lawrence Livermore National Laboratory

Dark Data: A Data Scientist's Exploration of the Unknown

Charles Doutriaux is a senior Lawrence Livermore National Laboratory research computer scientist, where he is internationally known for his work in climate analytics, informatics, and management systems supporting model intercomparison projects. He works closely with many world-renowned climate scientists and shares in the recognition of the Intergovernmental Panel on Climate Change 2007 Nobel Peace Prize. He has co-authored over 30 peer-reviewed articles. He presented his work to many scientific conferences. Aside from everything Python-related, his research interests include climate attribution and detection, visualization, and data analysis. Doutriaux has a master’s degree in “Climate and Physico-Chemistry of the Atmosphere” from the prestigious French University Joseph Fourier, Grenoble. He’s a member of the AGU and AMS. You can contact him at [email protected].

Christopher Roach

Conda

Christopher Roach has been everything from an embedded software engineer working on missile defense to a web and iOS developer at Apple. During that time he's continued to nurture his academic interests in the field of complexity with published papers in the areas of swarm intelligence and social network analysis. He holds degrees in Finance and Economics and a Master's in Computer Science and has had several Python related articles published in sources such as MacTech magazine and the O'Reilly network.

Dan Gunter

How Web APIs and Data-centric Tools Power the Materials Project

Dan Gunter leads the Data Intensive Systems group in the Computational Research Division at LBNL. His research is in distributed and parallel systems, with a focus on performance and usability issues at the intersection of large databases and fast networks. Recent work includes distributed workflow performance analysis; NoSQL databases for materials science; networked application adaptation; and user-centered design approaches. Dan has a MS in Computer Science from San Francisco State University.

Dave Himrod

Measuring the digital economy using big data

Director of Optimization and Analytics, AppNexus

As Director of Optimization and Analytics, Dave Himrod manages a team of analysts, quants, and engineers devoted to crafting world-class algorithms. When Dave joined in 2009, he managed AppNexus' first account, eBay. While building AppNexus' original optimization algorithm, Dave was heavily involved in building out the data-pipeline and defining the data model still in use today. He has since grown his team to more than 20 people and focuses his time on building a world-class scalable optimization system. He and his team continue to improve the tools for optimized pricing and budgeting for the over 27 billion ad impressions their platform sees per day. Dave has a Bachelor¹s Degree in Computer Science from University of Pennsylvania.

Elias Freider

Spotify

Luigi - Batch Data Processing in Python

I've been a developer in Spotify's analytics and data infrastructure team for the past two years. I've seen the team grew from two people doing basic log collection and reports to being over 20 people with a 120+ node Hadoop cluster containing petabytes of data. Like many other backend services at Spotify, almost all of our data pipeline, as we call it, is written in Python. I'm one of the architects and main contributors behind the Luigi data processing framework for Python recently open sourced by Spotify.

Fernando Perez

Research Scientist, UC Berkeley’s Helen Wills Neuroscience Institute/NumFOCUS

IPython: a modern vision of interactive computing

Fernando Pérez received his PhD in theoretical physics from the University of Colorado and did his post-doctoral work there in applied mathematics, working on fast algorithms for partial differential equations. He is currently a research scientist at UC Berkeley’s Helen Wills Neuroscience Institute, focusing on the development of new analysis methods for brain imaging problems and high-level scientific computing tools.

Towards the end of his graduate studies, he became involved with the development of Python tools for scientific computing. He started the open source IPython project in 2001 when he needed an efficient interactive workflow for everyday scientific tasks. He continues to lead IPython, as part of a growing team of talented developers.

He remains committed to the development of open, high-level tools to tackle the current challenges in computationally-based scientific research and education across disciplines. He is a member of the matplotlib development team and has contributed to numpy, scipy, sympy, mayavi, nipy and nitime. He regularly organizes workshops and lectures aimed at teaching the use of these tools to audiences at levels ranging from high-school students to research scientists. He is also a member of the Python Software Foundation.

When not glued to a computer, Fernando tries to spend as much time as possible with his wife outdoors hiking and backpacking, as well as climbing. For more information, see http://fperez.org.

Gabor Szabo

Twitter

PyCascading for Intuitive Flow Processing With Hadoop

Gabor is a Senior Data Scientist at Twitter. Before that he worked at HP Labs, Harvard Medical School, and the University of Notre Dame, doing research on social media, large-scale social and biological networks, and communication networks among mobile phone subscribers. His interests are in discovering and modeling human behavioral patterns using large datasets. He holds a PhD in Statistical Physics with a focus on random networks.

Henrik Brink

wise.io

Wise.io: A Machine-Learning Platform

Henrik is a co-founder, software architect and data scientist at wise.io, a machine-learning startup based in Berkeley, CA. After his studies in Physics and Astronomy at the University of Copenhagen, he has worked as a data scientist at the UC Berkeley Astronomy department and started his own software consultant business. At wise.io he is now leading the development of a machine-learning platform that aims to make it easier for developers and data scientists to use state-of-the-art machine learning technologies.

Jake Vanderplas

University of Washington

Creating Interactive Applications in Matplotlib, Machine Learning with scikit-learn , Python as Part of a Production Machine Learning Stack

Jake Vanderplas is an NSF Postdoctoral fellow working jointly in the Computer Science and Astronomy departments at the University of Washington. His research involves large-scale machine learning applications within astronomy and astrophysics. He is a maintainer of the Python packages Scikit-learn and Scipy, and regularly contributes to several of the other packages within the numpy/scipy ecosystem. He occasionally blogs about Python-related topics at Pythonic Perambulations - http://jakevdp.github.com.

James Powell

NYC Python

Dataflow Programming Using Generators and Coroutines, Embeddings of Python, Embeddings of Python, Embeddings of Python , Generator Showcase Showdown, Generator Showcase Showdown, Generators the Third, My First Numba

James Powell is a professional Python programmer based in New York City. He is the chair of the NYC Python meetup nycpython.com and has spoken on Python/CPython topics at PyData SV, PyData NYC, PyTexas, PyArkansas, PyGotham, and at the NYC Python meetup. He also authors a blog on programming topics at seriously.dontusethiscode.com

Jason Rudy

Clinicast

MARS Modeling on the Python Data Stack

Jason Rudy is the "Brain Builder" at Clinicast, a bay area health-tech startup focused on using predictive analytics to proactively manage patient care and reduce costs for at risk payers and providers. He studied molecular biology and math as an undergraduate, has his M.S. in bio- and medical informatics, and worked for Kaiser subsidiary Archimedes before joining Clinicast in 2012. He uses numpy, scipy, pandas, matplotlib, patsy, sklearn, statsmodels, and pymc in his daily work, but is occasionally still forced to fall back on R. In his past he was the author of two R packages on CRAN. He is currently trying to redeem himself by bringing some of his favorite modeling techniques from R into the Python data stack.

Josh Levy

Vast

Thin Client Data Science

Josh Levy is a Data Scientist at Vast in Austin, Texas. He works on content recommendation and text mining systems. He earned his doctorate at the University of North Carolina where he researched statistical shape models for medical image segmentation.

Katherine Chuang

Soostone Inc.

Big Data in Fashion, Social Network Analysis

Katherine Chuang is known for her volunteer work as an organizer for the Python community in the NYC area and is working behind the scenes to provide technical support PyData community, including this website. She recently graduated with a PhD and can often be found speaking about design and science. Her website is http://katychuang.com

Krishna Sankar

Tata Consultancy Services

Bayesian Machine Learning & Python – An etude

Krishna Sankar is currently a Principal Architect/Data Scientist with the NextGen Big Data group at Tata Consultancy Services. Prior to this he was Director of Engg/Data Science at Genophen, working on bioinformatics/consumer applications in AWS. He also has worked at Egnyte as a Lead Architect, developing cloud object store layer (handling billions of files/petabytes of storage) and security (federated Identity/SSO); and before that he was at Cisco as a Distinguished Engineer, lastly working on various aspects of "Big Data and Cloud Computing":http://doubleclix.wordpress.com/about/.

Krishna’s recent speaking engagements include OSCON 2012 Social Media Analysis with Twitter[http://goo.gl/mFflw], OSCON 2011 – Hitchhiker’s Guide to Kaggle[http://goo.gl/75X7w] & OSCON 2010 [http://goo.gl/8Ukiw] as well as guest lecturing at the Naval Postgraduate School on Big data [http://goo.gl/2pBYS]. His interests include big data stacks – from infrastructure to visualization, highly scalable cloud architectures & intelligent inferences. In his spare time, he is pursuing the Mining Massive Data Sets Graduate Certificate at Stanford. He also writes books – including “Cisco Wireless LAN Security” and “Enterprise Web 2.0”. His other passion is Lego Robotics and is contributing as Technical Judge in local & Lego world competitions.

Mike Mueller

Python Academy

Beautiful Plots With Matplotlib

Mike Müller, Ph.D. has been using Python as his primary programming language since 1999. He is Python Trainer and CEO at Python Academy (http://www.python-academy.com). He teaches a wide variety of Python topics including "Introduction to Python", "Python for Scientists and Engineers", "Advanced Python", "Optimization and Extensions of Python Programs" as well as "Software Engineering with Python". He programs mainly scientific software in Python for predictions of water quality of pit lakes, for numerical models with parallel execution or complex input/output streams for large amounts of data. He is a PSF member, PSF community service award holder, chairman of the Python Software Verband e.V., co-founder of the Leipzig Python User Group, was the lead organizer of the workshop "Python in German Speaking Countries" in 2006 and 2007, main organizer of the first two EuroSciPy Conference in 2008 and 2009 and Chair of PyCon DE 2011 and 2012, the first two German PyCon conferences.

Min Ragan-Kelley

IPython

IPython-parallel

Min has been a core developer of IPython since 2006, and the principal developer of IPython.parallel. He recently completed his PhD in Applied Science and Technology at UC Berkeley, and is now working full time on IPython, thanks to a grant from the Sloan Foundation.

Olivier Grisel

Scaling Machine Learning in Python

Olivier Grisel is a Software Engineer based in Paris with a background in Machine Learning and Natural Language Processing. He is a regular contributor and pull request reviewer for the scikit-learn project.

Peter Norvig

Director of Research, Google

Keynote

Peter Norvig is a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. At Google Inc he was the Director of Search Quality, responsible for the core web search algorithms from 2002-2005, and has been a Director of Research from 2005 on.

Previously he was the head of the Computational Sciences Division at NASA Ames Research Center, making him NASA's senior computer scientist. He received the NASA Exceptional Achievement Award in 2001. He has taught at the University of Southern California and the University of California at Berkeley, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He was co-teacher of an Artifical Intelligence class that signed up 160,000 students, helping to kick off the current round of massive open online classes. He has over fifty publications in Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering, including the books Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world's longest palindromic sentence.

Prashanth Mundkur

Nokia

Disco: Not Just MapReduce Any More

Prashanth Mundkur is a distributed systems researcher at Nokia. He is a core developer and maintainer of the Disco computing framework.

Robert Brewer

Building Analytic Database Engines With Python

Robert Brewer is best-known as the lead developer of CherryPy. He has also designed media types for large-scale transaction processing, and built open-source ORM's, database test frameworks, and debugging tools. He recently departed as Chief Architect of YouGov, a leading global market research firm, to help start a new company providing collaborative analytics as a cloud service. He has been a frequent speaker at PyCon for many years, providing advanced talks on a variety of subjects.

Ryan Faulkner

Wikimedia Foundation

Measuring the New Wikipedia Community

Ryan Faulkner is a data analyst at the Wikimedia Foundation (WMf) working for the Editor Engagement Experiments (E3) team. This team is responsible for experimenting with new features on Wikipedia with the aim of stimulating a healthier and more productive user experience and Wikipedia communities. His work focuses on the measurement and analysis of data from feature experiments aimed at new users and involves experimental design and metrics definitions along with an engineering effort to build systems to expose data conditioned on these definitions, and their subsequent analysis. The result of this work is a "User Metrics" API (UMAPI) built in Python and leveraging the Flask web framework. This talk will cover the UMAPI and general metrics measurement on Wikipedia in the context of E3 (touching on some other relevant Python data projects) and in the WMF a large.

Shreyas Cholia

Lawrence Berkeley National Laboratory

How Web APIs and Data-centric Tools Power the Materials Project

Shreyas Cholia works on science gateway, web and grid technologies at LBNL, with the goal of making high-performance and distributed computing more transparent and accessible. He has been involved in various grid and data-driven science efforts since 2002. Prior to his appointment at LBNL, Shreyas was a developer and consultant at IBM. He went to Rice University, where he studied Computer Science and Cognitive Sciences.

Steve Kannan

AppNexus

Measuring the digital economy using big data

Engineering Manager, Optimization and Analytics, AppNexus

As Engineering Manager for Optimization and Analytics, Steve manages software development for AppNexus's best-in-class systems for ad transaction optimization. Since joining AppNexus in 2010, Steve has led the design of distributed systems for scalable computation and data processing and set the technical standards for a team of engineers while iterating on the optimization feature set. Previously, Steve was a software developer at Google working on Google Places for Business and Local Search Quality. Steve has a Master's of Engineering in Electrical Engineering and Computer Science and a Bachelor's Degree in Computer Science from MIT.

Thomas Wiecki

Quantopian

Financial Analysis in Python, Zipline in the Cloud: Optimizing Financial Trading Algorithms

Thomas Wiecki is a 3rd year Ph.D. student at Brown University. His research domain is computational cognitive neuroscience. He also works as a researcher for Quantopian Inc. His interests include statistical modeling, Bayesian data analysis and scientific and higher performance python programming. Thomas is the author of several open source Python packages including HDDM, a scientific tool used to study decision making, and mpi4py_map, which adds worker-pool and queuing capabilities to mpi4py.

Travis Oliphant

Co-Founder & CEO, Continuum Analytics

Blaze, Building the PyData Community, Conda, Packaging and Deployment, Packaging and Deployment, Pythran: Static Compiler for High Performance, Scalable Analytics and Visualization: Connecting Expertise to Data With Python, Welcome

CEO and Co-Founder, Continuum Analytics Introduction to NumPy; Introduction to SciPy

Dr. Oliphant has a Ph.D. in Biomedical Engineering from the Mayo Clinic, and M.S. and B.S. degrees in Electrical Engineering (and Math) from Brigham Young University. Travis has worked extensively with Python for numerical and scientific programming since 1997, and was the primary developer of the NumPy package and the author of the definitive Guide to NumPy. He is also the primary founding author of the SciPy package. During his academic career, he has worked in the fields of satellite remote sensing, Magnetic Resonance Imaging (MRI), Ultrasound, elastography, and general inverse problems. He was an Assistant Professor of Electrical and Computer Engineering at Brigham Young University from 2001 to 2007 where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. In addition, he directed the BYU Biomedical Imaging Lab, and performed research on scanning impedance imaging. He has done consulting work since 1997 in laser scattering off of semiconductors, sparse matrix calculations for search engines, and mesh transformations for fluid dynamics. Dr. Oliphant co-founded Continuum Analytics, Inc. in 2012 and currently serves as its CEO.

Ville Tuulos

Bitdeli

Bitdeli - A Platform for Creating Custom Analytics in Your Browser

Ville Tuulos is the CEO and co-founder of Bitdeli. Prior to Bitdeli, Ville was a research leader in Nokia Research where he founded Disco, a popular open-source implementation of MapReduce. Ville is a long-time fan of Python, Erlang and Standard ML.

Wes McKinney

Data Wrangling Kung Fu With pandas , pandas

Wes is the creator and lead developer of the pandas library and the author of the O'Reilly book, Python for Data Analysis. He has served as an expert Python consultant to many financial firms and is actively engaged in industry conferences as a speaker. Prior to co-founding Lambda Foundry, Wes worked at AQR Capital Management researching global macro and credit trading strategies. He holds a degree in Mathematics from MIT, with additional graduate studies in Statistical Science at Duke University.

Speaker Bios

Angelica Pando

Anthony Scopatz

Bryan Van De Ven

Beautiful Interactive Visualizations in the Browser with Bokeh, Bokeh, Bokeh, Bokeh - Interactive Visualization for Large Datasets, Bokeh Tutorial, Interactive Plots Using Bokeh, The IPython protocol, frontends and kernels

Chang She

Charles Doutriaux

Christopher Roach

Dan Gunter

Dave Himrod

Elias Freider

Fernando Perez

Gabor Szabo

Henrik Brink

Jake Vanderplas

Creating Interactive Applications in Matplotlib, Machine Learning with scikit-learn , Python as Part of a Production Machine Learning Stack

James Powell

Dataflow Programming Using Generators and Coroutines, Embeddings of Python, Embeddings of Python, Embeddings of Python , Generator Showcase Showdown, Generator Showcase Showdown, Generators the Third, My First Numba

Jason Rudy

Josh Levy

Katherine Chuang

Big Data in Fashion, Social Network Analysis

Krishna Sankar

Mike Mueller

Min Ragan-Kelley

Olivier Grisel

Peter Norvig

Prashanth Mundkur

Robert Brewer

Ryan Faulkner

Shreyas Cholia

Steve Kannan

Thomas Wiecki

Financial Analysis in Python, Zipline in the Cloud: Optimizing Financial Trading Algorithms

Travis Oliphant

Blaze, Building the PyData Community, Conda, Packaging and Deployment, Packaging and Deployment, Pythran: Static Compiler for High Performance, Scalable Analytics and Visualization: Connecting Expertise to Data With Python, Welcome

Ville Tuulos

Wes McKinney

Data Wrangling Kung Fu With pandas , pandas

Sponsors

PLATINUM

GOLD

SILVER

STUDENT

MEDIA