Sprints

PYDATA GLOBAL IS HOSTING MULTIPLE SPRINTS FOR OPEN SOURCE PROJECTS. EACH SPRINT WILL HAVE A PROJECT MAINTAINER LEADING THE SPRINT TO HELP GUIDE THE CONTRIBUTORS THROUGHOUT THE SESSION.

What are Sprints?

Development sprints offer an opportunity to enhance and contribute to open source projects in a focused session with the project maintainers. It is a fun exercise that helps open source projects to improve with the help of the open source community.

Who can participate?

ALL experience levels are welcome to participate. Contribution guides and environment setup instructions are provided with each sprint.

Sprints Schedule

You can access the schedule of the sprints using this link.

The duration of each sprint is 4 hours unless stated otherwise.

Communication Channels

Discord
Every sprint will have its own channel on discord. Contributors should enroll in those discord channels to receive the list of issues to work on and have a space to discuss their progress with the maintainers.

Zoom
During the sprint, there will be a dedicated zoom call for that sprint which will be used for sprint introduction. Participants will be divided into breakout rooms where a maintainer will come over to support attendees when needed.

Before the Sprint
It is crucial to read the documentation and environment set up instructions to have a productive sprint. If you’re facing any issues in setting up your environment, please use slack to ask for support in setting up the environment before the sprint begins.

Which Projects are Sprinting?

In PyData Global 2022, 8 projects will have separate sprint sessions with some of its awesome project core developers and maintainers!

Time and place

December 1 | 1pm – 5pm UTC (2pm Budapest CET)

The sprint is held on Zoom at https://us05web.zoom.us/j/87264629924?pwd=NFJ6K09XMkRkV2xWWE9RY2x4czJ3dz09.

About Vizzu & ipyvizzu

We built a data viz engine that – as Elijah Meeks put it – provides the most complete form of animating between chart forms. We embedded it into open-source charting tools and built data stories that gathered 40+ million views and 200k+ upvotes on Reddit.

Our products work similarly to other charting solutions, but when you create a set of charts with Vizzu, it automatically animates between them. We have an open-source Javascript library called vizzu-lib and ipyvizzu, a Jupyter notebook package for Python developers and data scientists. Both solutions have automatic data aggregation and filtering capabilities, defaults are set based on dataviz guidelines.

In July 2022 we added a storytelling extension to ipyvizzu called ipyvizzu-story that enables users to build, present and share animated data stories in Jupyter and similar computational notebooks.

This sprint will focus on creating new sample stories, color schemes, e.g., dark and accessible themes, and documentation improvements. We work hard to make this a fun experience for all contributors.

Maintainers leading the sprint:

Simon – Github, LinkedIn

Simon wrote most of the code used in our data viz engine and our Javascript library. He is a big believer in open-source with 15+ year of developer and software architecture experience under his belt.

Peter – Github, LinkedIn

Peter created most of the documentation for ipyvizzu and the JS library and built many examples and public stories with the Vizzu tools. He loves assisting people experimenting with our animated charting tools. 

Important Links

When:

December 1, Thursday 2PM – 7:00PM UTC
December 2, Friday 3:00PM – 7:00PM UTC
December 3, Saturday: 1:00PM – 5:00PM UTC

About

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

Communication Channel: https://gitter.im/matplotlib/matplotlib

Time and place: Please say hello on gitter first! 

Dec 1: https://numfocus-org.zoom.us/j/87126200471?pwd=UmlLSldBRVhEa2NUQzE5dzZmWHVRdz09

Dec 2: https://numfocus-org.zoom.us/j/87362906269?pwd=TW90MlpYdml2eVFvVWdjZktLcVl1Zz09

Dec 3: https://numfocus-org.zoom.us/j/83714364966?pwd=dnRLK2ZjcWQ2SWpMQm0wK1c0b2pwUT09

Focus: Matplotlib good first issues

Important links:

Contributing to the matplotlib library: 

Contributing to matplotlib in other ways:

Mentors:

Name: Thomas Caswell, github: tacaswell

Availability: Dec 2 1100 – 1600 EST, Dec 3 0800-0000

Bio: Matplotlib project lead.

Name: Hannah AIzenman, github: story645, discord: story645

Availability: TBD: Dec 1, Dec 2 before 2:00PM EST, 

Bio: Matplotlib community manager and studying visualization at The Graduate Center, CUNY.

Name: Gregor Mönke, github: tensionhead, discord: Whir#2652

Availability: TBD: Dec 1, Dec 2, Dec 3 after 12:00PM CET, 

Bio: Computational Scientist, working interdisciplinary (Biology) with tech beginners for many years. Now leading a Python neuroscience software project: syncopy.org

Name: Chahak Mehta, github: chahak13

Availability: Dec 1, before 4:00PM CST

Bio: Graduate student studying Computational Science at University of Texas, Austin.

Name: Oscar Gustafsson, github: oscargus

Availability: Dec 3 1000-1800 CET

Bio: Matplotlib developer, academic working in electrical and computer engineering

Time and place

December 2, 10:00-14:00 UTC (19:00-23:00 JST), 2022

The sprint is held on Zoom at https://preferrednetworks.zoom.us/j/81436270302?pwd=SHdQRzl2WUNQVGxFZ2RyeVRwYnVodz09

Make sure to have Zoom available. If possible, update your Zoom client to version 5.12 or newer.

Project description
Optuna (https://github.com/optuna/optuna) is an automatic hyperparameter optimization software framework, particularly designed for machine learning.

It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.

Today, the project has grown to over 7000 stars and close to 200 contributors on GitHub. It just released a new major version v3, with many exciting updates! This sprint will be a great opportunity to get familiar with the codebase of Optuna together with its core developers.

What will be the focus of the sprint?
The sprint will be open to a wide variety of issues. We have prepared several contribution welcome issues, and for first time contributors, good-first issues to reduce runtime warnings from unit tests.

Maintainers leading the sprint:

Hiroyuki Vincent Yamazaki

Discord username: hvy (Vincent)
GitHub: https://github.com/hvy
Vincent is a maintainer of Optuna, working as an engineer at Preferred Networks Inc. in Tokyo. He has a background in Computer Science from Sweden. He is a previous maintainer of Chainer (https://github.com/chainer/chainer) and CuPy (https://github.com/cupy/cupy/).

Hideaki Imamura

Discord username: mamu (Hideaki)
GitHub: https://github.com/hideakiimamura
Hideaki is a maintainer of Optuna, working as a researcher at Preferred Networks Inc. in Tokyo. He holds a Master’s degree in Computer Science from the University of Tokyo and has been in his current position since April 2020 working on Optuna development, where he led the Optuna v3.0 release.

Keisuke Umezawa

Discord username: kumezawa (Keisuke)
GitHub: https://github.com/keisuke-umezawa
Keisuke is a Software and ML Engineer with experience in financial modeling and recommendation engine development at web companies and Fintech startups. Also, he is a maintainer of Optuna, working as an Engineering Manager at Mercari Inc. in Tokyo.

Adrian Zuber

Discord username: xadrianzetx (Adrian)
GitHub: https://github.com/xadrianzetx
Adrian is an ML Engineer experienced in designing and developing ML based systems in financial and proptech industries. Has contributed to Optuna since mid 2021 and is currently working for Norwegian startup as Software/ML Engineer.

Project links

Time and place:

Friday, December 2nd, 2022

2-6pm UTC

Zoom: https://numfocus-org.zoom.us/j/82538167858?pwd=S1VnWU1IMmViaFQxRExKZFlkSzVlQT09

password: pydata

You are welcome to join at 2pm, or at 4pm, when we will repeat the introduction.

Project description:

logos source: https://pandas.pydata.org/about/citing.html

pandas is a data wrangling platform for Python widely adopted in the scientific computing community. In this session, you will be guided on how you can make your own contributions to the project, no prior experience contributing required! Not only will this teach you new skills and boost your CV, you’ll also likely get a nice adrenaline rush when your contribution is accepted!

Maintainers and contributors leading the sprint:

Joris Van den Bossche (he/him)

Joris is a pandas maintainer and works at Voltron Data as a Software Engineer. He has particular interest in pandas’ internals, and API design.

Marco Gorelli (he/him)

Marco is a pandas maintainer and works at Quansight as a Senior Software Engineer. He has particular interests in datetime parsing, possibly because his previous job involved ~~gazing into crystal balls~~ sales forecasting.

Will Ayd (he/him)

Will is a pandas maintainer and owns a data consultancy called innobi. He has particular interest in performance optimization and I/O

Noa Tamir (she/they)

Noa is a pandas contributor experience lead and works at Quansight as a Senior Developer Experience Engineer. Interested in onboarding new contributors, tools integration, and documentation (especially the contributor’s guide).

Patrick Hoefler (he/him)

Patrick is a pandas maintainer and works as a senior consultant at d-fine. Patrick won’t attend the sprint, but will be reviewing the pull requests later.

Important Links

To get the most out of the session, it’s encouraged (but not required) that you have a look at the [contributing guide](https://pandas.pydata.org/pandas-docs/dev/development/contributing.html) beforehand. In particular it would be useful if you are able to set up your development environment in advance. This will allow you to use more of the sprint time to work directly on finding a relevant issue and starting to work on it. Check out the [development environment instructions](https://pandas.pydata.org/docs/dev/development/contributing_environment.html).

You are welcome to join our [contributor community](https://pandas.pydata.org/pandas-docs/dev/development/community.html)!  We hold regular meetings for new contributors and have a slack channel for ongoing communication, like development environment set-up.

Discord users names

Noa `GenOrgana#1770`

Will `willa#7258`

Time and place:

December 2 | 3pm – 7pm UTC 

https://anaconda.zoom.us/j/94092709446

About HoloViz

HoloViz is a coordinated effort to streamline data visualization in Python. It provides a set of interoperable Python packages that make viz easier, more accurate, and more powerful: Panel for creating apps and dashboards for your plots from any supported plotting library, hvPlot to quickly generate interactive plots from your data, HoloViews to help you make all of your data instantly visualizable, GeoViews to extend HoloViews for geographic data, Datashader for rendering even the largest datasets, Lumen to build data-driven dashboards from a simple YAML specification, Param to create declarative user-configurable objects, and Colorcet for perceptually uniform colormaps.

This sprint will primarily focus on documentation improvements. This is a great way for both new and experienced users to contribute to the community.

Maintainers leading the sprint

Demetris Roumis
Demetris (dah – me – tree) is a contributor to HoloViz packages, currently leading efforts to improve documentation. He works as a software engineer at Anaconda and has a background in neuroscience research and neurotechnology development. 

Ian Thomas
Ian is a software engineer at Anaconda and a contributor to a number of open-source projects including Bokeh, Datashader and Matplotlib. He has particular interest in using OpenGL/WebGL for fast rendering, and spatial algorithms such as calculating contours. Ian is British and drinks a lot of tea.

Important Links

When

2nd December, 17:00-20:00 UTC

Zoom Meeting
Join Zoom Meeting
https://us06web.zoom.us/j/83833047249?pwd=NkJxdlFBMDZQRHY5VHBDL3RkRG4xUT09

Meeting ID: 838 3304 7249
Passcode: 369963

Project Description

Zarr is a data storage format for storing chunked, compressed N-dimensional arrays and associated metadata. Zarr is based on an open-source technical specification and has implementations in several languages. Zarr is generally used to store large datasets. In addition, there is a variety of compressors available to use via the Numcodecs library.

Store arrays in memory, on disk, inside a Zip file on any cloud storage
Read and write to an array concurrently from multiple threads or processes
Organize arrays into hierarchies via groups

Sprint Focus

Maintainers for the Sprint:

Josh Moore
GitHub: https://github.com/joshmoore/

Discord: joshmoore#7588

Bio: Josh Moore is a research software engineer living in Germany. His primary professional focus is the Open Microscopy Enviorment (OME) which provides open-source infrastructure for accessing and sharing bioimaging data. (Like really big images of fish embryoes). A critical part of doing that is defining good file formats for such big images which is why he got involved in Zarr of which he’s now a maintainer. Happy-life ABCs: aikido, books, coffee

 

Sanket Verma
GitHub: https://github.com/msankeys963/

Discord: msankeys963#0660

Bio: Sanket Verma takes care of community and OSS at Zarr as their Community Manager. He loves building communities, data science tools and products and has worked with startups, government and organisations. When he’s not working he likes to play the violin and computer games and sometimes thinks of saving the world!

Important links:

When

Saturday, December 3rd, 2 – 8 pm UTC

Zoom call link:

https://numfocus-org.zoom.us/j/82563808729?pwd=ZFU3Z2dMcXBGb05YemRsaGE1OW5nQT09

About NumPy

NumPy is an open source Python library that is the de facto standard for multidimensional arrays in data analysis with Python. Nearly every scientist working in Python draws on the power of NumPy. The library offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and much more. It lies at the core of a rich ecosystem of data science libraries.

Sprint Mentors:

Rohit Goswami: NumPy and f2py maintainer, PhD candidate in Chemical Engineering at the University of Iceland.
GitHub: https://github.com/HaoZeke

Ganesh Kathiresan: NumPy core developer and maintainer, software development engineer at Amazon.
LinkedIn: https://www.linkedin.com/in/ganesh-kathiresan/
GitHub: https://github.com/ganesh-k13/

Mukulika Pahari: NumPy documentation team co-lead, numpy/numpy-tutorials maintainer, computer engineering undergraduate student at the University of Mumbai.
GitHub: https://github.com/Mukulikaa

Inessa Pawson: NumPy contributor experience lead.
GitHub: https://github.com/InessaPawson

Important links:

For more info, visit: https://hackmd.io/abDhHl2ER2G-gnc4Gkar7A

When

Saturday 3 December 2022, 16:00 to 20:00 UTC.

Google Meet Video call link:

  🌐 Bokeh Sprint at PyData Global

Video call link: https://meet.google.com/mhm-sqkm-fog

Or dial: ‪(US) +1 786-352-8569‬ PIN: ‪734 211 239‬#

More phone numbers: https://tel.meet/mhm-sqkm-fog?pin=2958513337733

What is Bokeh?

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords high-performance interactivity over large or streaming datasets. Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications.

What is the focus of the Sprint?

  • Adding metadata to existing examples
  • Adding new examples to the Bokeh Gallery

Biographies:

Bryan Van de Ven
Bryan is a Senior Systems Software Engineer at NVIDIA, where he works to improve Python tooling and processes for multiple open-source projects. Previously he worked at Microsoft, and also at Anaconda, where he created the conda package manager and co-created the Bokeh visualization library.

Pavithra Eswaramoorthy
Pavithra is a Developer Advocate at Quansight, where she works to support the PyData community. She also contributes to the Bokeh and Dask projects. In her spare time, she enjoys a good book and hot coffee.

Timo Metzger
Timo is a technical writer and  project manager at makepath. He loves finding the right words for complex technical ideas and helping others get started with open-source projects.

Ian Thomas
Ian is a Senior Software Engineer at Anaconda and a contributor to a number of open-source projects including Bokeh, Datashader and Matplotlib. He has particular interest in using OpenGL/WebGL for fast rendering, and spatial algorithms such as calculating contours. Ian is British and drinks a lot of tea.

Important links:

  • Contributor guide
  • Any internal sprint documentation available: TBD
  • Preferred communication channel: PyData Global discord channel for Bokeh  

Sprint leads’ handles on Discord:

  • Bryan: @brayn
  • Pavithra: @pavithraes
  • Ian: @Ian Thomas
  • Timo: @tcmetzger