Note: This is still subject to minor changes.
Track 1 | Track 2 | |
8h30 | Doors open | |
9h00 | Welcome talks | |
9h10 | Opening keynote: Olivier Grisel "Predictive Modeling and Python: some trends" | |
10h00 | "Python to Report in one command", by Vicky Close |
Automatic Machine Learning using
Python & scikit-learn", by Abhishek Thakur |
10h45 | Coffee break | |
11h15 | "Wendelin: from stock movements to
pivot tables inside Jupyter", Douglas Camata (Nexedi) |
Shorter
talks "Lightning, a library for large-scale machine learning in Python", by Fabian Pedregosa "Python and Big Data: a good match?", by Pierrick Boitel (Affini-Tech) "Collecting PyData from Your Running Processes", by Rafael Monnerat (Nexedi) |
12h00 | "Prescriptive Analytics with
docplex and pandas", Hugues JUILLE (IBM) |
|
13h00 | Lunch break |
14h00 | "Opening up the French tax
software", Emmanuel Raviart (DINSIC) |
"Statistical Topic Extraction", by Laurie Lugrin |
14h45 | "Scikit-learn for text mining at
Jurismarchés", by Oussama Ahmia (Jurismarché) |
"Joblib: toward efficient
computing from laptop to cloud", by Alexandre Abadie (INRIA) |
15h30 | "We Have Our Ways: Extracting and
Analyzing Online Confessions", by Omer Yuksel |
"How Apache Arrow and Parquet
boost cross-language interop", by Uwe L. Korn |
16h15 | Coffee break | |
16h45 | Round table: "How to become a data scientist" | |
17h15 | Closing keynote: Emmanuelle Gouillart, "Why Scientific Python rocks: simple APIs and innovative documentation" | |
18h30 | Food & drinks | |
19h30 | End of day 1 |
Track 1 | scikit-learn workshop | Tutorials 1 | Tutorials 2 | |
8h30 | Doors open |
|||
9h00 | "10 plotting libraries", Xavier Dupré (Microsoft) |
See here | Become an
expert in webscraping (data extraction) by Fabien Vauchelles (Zelros) |
|
9h45 | "How to visualize and explore a
billion objects interactively", by Maarten Breddels |
|||
10h30 | Coffee break | |||
11h | "Building Visualisations in D3.JS
for Python Programmers", by Thomas Parslow |
Tidy Data In Python by Jean-François Puget (IBM) |
Hyperconvergence: From Big Data to Small Application in 90 Minutes, by Sven Franck (Nexedi) |
|
11h45 | "When Software Craftsmanship
meets Data Science", by Yoann Benoit and Sylvain Lequeux (Xebia) |
|||
12h30 | Lunch break |
Track 1 | scikit-learn workshop | Tutorials 1 | Tutorials 2 | |
14h00 | "Maths @ Saint-Gobain : from
marketing to plants through Python", by Alessandro Giassi (Saint-Gobain) |
See here |
Practical Pythran, by Serge « sans paille » Guelton (Namek) |
Creating Custom
Interactive Widgets for the Jupyter Notebook, by Sylvain Corlay (Bloomberg) |
14h45 | "How to apply data to make better
hiring decision in recruitment", by Ken Yeung |
|||
15h30 | "Using Python to revolutionize
the musical instruments manufacturing", by Olivier CAYROL (Logilab) |
|||
16h00 | Coffee break | |||
16h30 | Lightning talks | |||
17h00 | Closing remarks | |||
18h00 | End of the conference |
(See also the scikit-learn day workshop on day 2.)
Please use the CFP engine if you'd like to give a lightning (5 minutes) or short (10-15 minutes) talk.
On day 2 (June 15th), there will be at least 5 tutorial sessions, each approximately 120 minutes long (more or less...):
When Hadley Wickham published Tidy Data he provided the R community with extremely useful data wrangling packages as well as a new way to look at data. Unfortunately for Python users, no equivalent seemed to exist. Fortunately, the Pandas package is versatile enough to support tidy data preparation. We review in this tutorial how to use Pandas to implement Tidy Data.
We present in this tutorial how to perform the data transformations described in Hadley Wickham's Tidy Data. We will cover: - Tidy vs messy data sets - A simple melt example - What to do when column names are values, not feature names - What to do when multiple features stored in one column - What to do when features are stored in both rows and columns - How to get to a database normal form
Pythran is an open-source, high-level compiler for numeric Python, that only needs a few function type annotations and a working modern C++ compiler to turn high level numerical kernel into efficient, eventually vectorized and parallel, native kernels that can be directly incorporated into numeric applications. This tutorial showcases typical use cases of Pythran through a notebook-based demo.
This tutorials starts with a high-level numerical kernel and quickly demonstrates how Pythran can be used as a drop-in replacement for Cython while maintaining backward compatibility with Python and without the burden of learning a new language.
The tutorial briefly introduces the few concepts used by Pythran: difference between a native module and a regular module, SIMD instructions and multi-cores. It then dives into the Pythran type annotations, OpenMP support, and finishes with the Jupyter notebook integration and distutils support.
At the end of the talk, the attendance should be able to use Pythran in their own project to accelerate their scientific computations.
In this tutorial, you will learn about one of the main extension points of the Jupyter notebook: the Jupyter interactive widgets.
We first cover the main functionalities of the core interactive widgets and the most popular custom interactive widgets libraries: bqplot, pythreejs and ipyleaflet.
Then show how these tools can be used to author interactive data visualization dashboards that can be distributed as standalone applications outside of the Jupyter notebook.
Finally, we present a detailed tutorial for authoring a custom widget library and distribute it using the standard Python packaging tools.
Collect 1 price on a webpage is easy. But grab 10 million products is very hard! Websites change their content, protect their data and we can lose months to build a webscraper... Discover how to save time by avoiding the mistakes of webscraping!
This workshop lasts 2 hours.
I begin with a presentation of webscraping techniques (20min). Then, I introduce the Scrapy framework and its use (10min). With the presentation, people start the workshop faster.
Then, we start the workshop. It lasts 1h20. People work gradually through 4 challenges: the scraping of a single page, the scraping of multiple pages, the bypassing of several protections.
The workshop is here: https://github.com/fabienvauchelles/scraping-challenge-workshop.
Slides are here: http://bit.ly/datascrap
The tutorial will show how to use Wendelin, the free software platform for Big Data & Machine Learning written in 100% Python to analyse and visualize a large data set and develop a working web application from the results.
The idea of the tutorial is to demonstrate how a "data life cycle" can be managed with Wendelin - covering ingestion, analysis, visualization and weaving it into an application. We'll show how Wendelin could handle both the analysis and exploitation of data, making it a potential solution for IOT scenarios where data is available and needs some logic applied before being presented as web application, possibly on a commercial basis.
Wendelin Introduction: We will briefly introduce the Wendelin project and all of its components.
Setting up Wendelin: We will walk through the steps necessary to get a working Wendelin instance. We will add fluentd for data ingestion. There will be some instances pre-prepared and configured, so participants can also working along.
Collecting and Ingesting Data: We will record audio during the tutorial and pipe the data into Wendelin to work with it.
Analysis: We will use Jupyter to do a Fourier Series on the collected data and save the results back to Wendelin.
Web Application: We will show how to setup a simple web application in Wendelin and display the results of our analysis.
Interactivity: We will add some basic interactions and show how Sensor Data and User Data are kept in the same system.
Outlook: We will finish with a look at the Wendelin project roadmap and how we see Wendelin as free software/open source solution for both IOT and hyperconvergence topics, especially in light of exponentially growing amounts of data.