Conference Schedule

Time table

Note: This is still subject to minor changes.

Tuesday June 14th - Morning


Track 1 Track 2
8h30 Doors open
9h00 Welcome talks
9h10 Opening keynote: Olivier Grisel "Predictive Modeling and Python: some trends"
10h00 "Python to Report in one command",
by Vicky Close
Automatic Machine Learning using Python & scikit-learn",
by Abhishek Thakur
10h45 Coffee break
11h15 "Wendelin: from stock movements to pivot tables inside Jupyter",
Douglas Camata (Nexedi)
Shorter talks
"Lightning, a library for large-scale machine learning in Python",

by Fabian Pedregosa
"Python and Big Data: a good match?",
by Pierrick Boitel (Affini-Tech)
"Collecting PyData from Your Running Processes",
by Rafael Monnerat (Nexedi)
12h00 "Prescriptive Analytics with docplex and pandas",
Hugues JUILLE (IBM)
13h00 Lunch break

Tuesday June 14th - Afternoon

14h00 "Opening up the French tax software",
Emmanuel Raviart (DINSIC)
"Statistical Topic Extraction",
by Laurie Lugrin
14h45 "Scikit-learn for text mining at Jurismarchés",
by Oussama Ahmia (Jurismarché)
"Joblib: toward efficient computing from laptop to cloud",
by Alexandre Abadie (INRIA)
15h30 "We Have Our Ways: Extracting and Analyzing Online Confessions",
by Omer Yuksel
"How Apache Arrow and Parquet boost cross-language interop",
by Uwe L. Korn
16h15 Coffee break
16h45 Round table: "How to become a data scientist"
17h15 Closing keynote: Emmanuelle Gouillart, "Why Scientific Python rocks: simple APIs and innovative documentation"
18h30 Food & drinks
19h30 End of day 1

Wednesday June 15th - Morning


Track 1 scikit-learn workshop Tutorials 1 Tutorials 2
8h30 Doors open
9h00 "10 plotting libraries",
Xavier Dupré (Microsoft)
See here Become an expert in webscraping (data extraction)
by Fabien Vauchelles (Zelros)

9h45 "How to visualize and explore a billion objects interactively",
by Maarten Breddels
10h30 Coffee break

11h "Building Visualisations in D3.JS for Python Programmers",
by Thomas Parslow

Tidy Data In Python

by Jean-François Puget (IBM)

Hyperconvergence: From Big Data to Small Application in 90 Minutes,
by Sven Franck (Nexedi)
11h45 "When Software Craftsmanship meets Data Science",
by Yoann Benoit and Sylvain Lequeux (Xebia)
12h30 Lunch break

Wednesday June 15th - Afternoon


Track 1 scikit-learn workshop Tutorials 1 Tutorials 2
14h00 "Maths @ Saint-Gobain : from marketing to plants through Python",
by Alessandro Giassi (Saint-Gobain)



See here

Practical Pythran,
by Serge « sans paille » Guelton (Namek)
Creating Custom Interactive Widgets for the Jupyter Notebook,
by Sylvain Corlay (Bloomberg)
14h45 "How to apply data to make better hiring decision in recruitment",
by Ken Yeung
15h30 "Using Python to revolutionize the musical instruments manufacturing",
by Olivier CAYROL (Logilab)


16h00 Coffee break
16h30 Lightning talks
17h00 Closing remarks
18h00 End of the conference

Breakdown by topics

Regular talks (Day 1 or Day 2)

Track "industry and experience reports"

  1. "Opening up the French tax software", Emmanuel Raviart (DINSIC)
  2. "Scikit-learn for text mining at Jurismarchés", by Oussama Ahmia (Jurismarché)
  3. "We Have Our Ways: Extracting and Analyzing Online Confessions", by Omer Yuksel
  4. "Automated, data driven literary analysis", Serena Peruzzo
  5. "Maths @ Saint-Gobain : from marketing to plants through Python", by Alessandro Giassi (Saint-Gobain)
  6. "Using Python to revolutionize the musical instruments manufacturing", by Olivier CAYROL (Logilab)
  7. "How to apply data to make better hiring decision in recruitment", by Ken Yeung

Track "Machine Learning"

(See also the scikit-learn day workshop on day 2.)

  1. "Automatic Machine Learning using Python & scikit-learn", by Abhishek Thakur
  2. "Lightning, a library for large-scale machine learning in Python", by Fabian Pedregosa

Track "Stats and reporting"

  1. "Python to Report in one command", by Vicky Close
  2. "Wendelin: from stock movements to pivot tables inside Jupyter", Douglas Camata (Nexedi)
  3. "Prescriptive Analytics with docplex and pandas", Hugues JUILLE (IBM)

Track "Dataviz"

  1. "10 plotting libraries", Xavier Dupré (Microsoft)
  2. "How to visualize and explore a billion objects interactively", by Maarten Breddels
  3. "Building Visualisations in D3.JS for Python Programmers", by Thomas Parslow

Uncategorized (yet?)

  1. "Statistical Topic Extraction", by Laurie Lugrin
  2. "Joblib: toward efficient computing from laptop to cloud", by Alexandre Abadie (INRIA)
  3. "How Apache Arrow and Parquet boost cross-language interop", by Uwe L. Korn
  4. "When Software Craftsmanship meets Data Science", by Yoann Benoit and Sylvain Lequeux (Xebia)

Short talks / lightning talks

  1. "Python and Big Data: a good match?", by Pierrick Boitel (Affini-Tech)

Please use the CFP engine if you'd like to give a lightning (5 minutes) or short (10-15 minutes) talk.

Tutorials / workshops (Day 2)

On day 2 (June 15th), there will be at least 5 tutorial sessions, each approximately 120 minutes long (more or less...):

"Tidy Data In Python", by Jean-François Puget (IBM)

When Hadley Wickham published Tidy Data he provided the R community with extremely useful data wrangling packages as well as a new way to look at data. Unfortunately for Python users, no equivalent seemed to exist. Fortunately, the Pandas package is versatile enough to support tidy data preparation. We review in this tutorial how to use Pandas to implement Tidy Data.

We present in this tutorial how to perform the data transformations described in Hadley Wickham's Tidy Data. We will cover: - Tidy vs messy data sets - A simple melt example - What to do when column names are values, not feature names - What to do when multiple features stored in one column - What to do when features are stored in both rows and columns - How to get to a database normal form

"Practical Pythran", by Serge Guelton (Namek)

Pythran is an open-source, high-level compiler for numeric Python, that only needs a few function type annotations and a working modern C++ compiler to turn high level numerical kernel into efficient, eventually vectorized and parallel, native kernels that can be directly incorporated into numeric applications. This tutorial showcases typical use cases of Pythran through a notebook-based demo.

This tutorials starts with a high-level numerical kernel and quickly demonstrates how Pythran can be used as a drop-in replacement for Cython while maintaining backward compatibility with Python and without the burden of learning a new language.

The tutorial briefly introduces the few concepts used by Pythran: difference between a native module and a regular module, SIMD instructions and multi-cores. It then dives into the Pythran type annotations, OpenMP support, and finishes with the Jupyter notebook integration and distutils support.

At the end of the talk, the attendance should be able to use Pythran in their own project to accelerate their scientific computations.

"Creating Custom Interactive Widgets for the Jupyter Notebook", by Sylvain Corlay

In this tutorial, you will learn about one of the main extension points of the Jupyter notebook: the Jupyter interactive widgets.

We first cover the main functionalities of the core interactive widgets and the most popular custom interactive widgets libraries: bqplot, pythreejs and ipyleaflet.

Then show how these tools can be used to author interactive data visualization dashboards that can be distributed as standalone applications outside of the Jupyter notebook.

Finally, we present a detailed tutorial for authoring a custom widget library and distribute it using the standard Python packaging tools.

"Become an expert in webscraping (data extraction)", by Fabien Vauchelles

Collect 1 price on a webpage is easy. But grab 10 million products is very hard! Websites change their content, protect their data and we can lose months to build a webscraper... Discover how to save time by avoiding the mistakes of webscraping!

This workshop lasts 2 hours.

I begin with a presentation of webscraping techniques (20min). Then, I introduce the Scrapy framework and its use (10min). With the presentation, people start the workshop faster.

Then, we start the workshop. It lasts 1h20. People work gradually through 4 challenges: the scraping of a single page, the scraping of multiple pages, the bypassing of several protections.

The workshop is here: https://github.com/fabienvauchelles/scraping-challenge-workshop.

Slides are here: http://bit.ly/datascrap

"Hyperconvergence: From Big Data to Small Application in 120 Minutes.", by Sven Franck (Nexedi)

The tutorial will show how to use Wendelin, the free software platform for Big Data & Machine Learning written in 100% Python to analyse and visualize a large data set and develop a working web application from the results.

The idea of the tutorial is to demonstrate how a "data life cycle" can be managed with Wendelin - covering ingestion, analysis, visualization and weaving it into an application. We'll show how Wendelin could handle both the analysis and exploitation of data, making it a potential solution for IOT scenarios where data is available and needs some logic applied before being presented as web application, possibly on a commercial basis.

Wendelin Introduction: We will briefly introduce the Wendelin project and all of its components.

Setting up Wendelin: We will walk through the steps necessary to get a working Wendelin instance. We will add fluentd for data ingestion. There will be some instances pre-prepared and configured, so participants can also working along.

Collecting and Ingesting Data: We will record audio during the tutorial and pipe the data into Wendelin to work with it.

Analysis: We will use Jupyter to do a Fourier Series on the collected data and save the results back to Wendelin.

Web Application: We will show how to setup a simple web application in Wendelin and display the results of our analysis.

Interactivity: We will add some basic interactions and show how Sensor Data and User Data are kept in the same system.

Outlook: We will finish with a look at the Wendelin project roadmap and how we see Wendelin as free software/open source solution for both IOT and hyperconvergence topics, especially in light of exponentially growing amounts of data.