BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//seattle2023.pydata.org//cfp//TAANV9
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:STANDARD
DTSTART:20001029T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10;UNTIL=20061029T090000Z
TZNAME:PST
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
END:STANDARD
BEGIN:STANDARD
DTSTART:20071104T020000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:PST
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000402T020000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4;UNTIL=20060402T100000Z
TZNAME:PDT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
END:DAYLIGHT
BEGIN:DAYLIGHT
DTSTART:20070311T020000
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:PDT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-cfp-BRVLPA@seattle2023.pydata.org
DTSTART;TZID=America/Los_Angeles:20230428T114500
DTEND;TZID=America/Los_Angeles:20230428T123000
DESCRIPTION:Using Spark\, Dask\, or Ray is not an all-or-nothing thing. It 
 may seem daunting for new practitioners expecting to translate existing Pa
 ndas pipelines to these big data frameworks. In reality\, distributed comp
 uting can be incrementally adopted. There are many use cases where only on
 e or two steps of a pipeline require expensive computation. This talk cove
 rs the strategies and best practices around moving portions of workloads t
 o distributed computing through the open-source Fugue project. The Fugue A
 PI has a suite of standalone functions compatible with Pandas\, Spark\, Da
 sk\, and Ray. Collectively\, these functions allow users to scale any part
  of their pipeline when ready for full-scale production workloads on big d
 ata.
DTSTAMP:20250709T220109Z
LOCATION:St. Helens
SUMMARY:How to incrementally scale existing workflows on Spark\, Dask or Ra
 y? - Han Wang\, Jun Liu
URL:https://seattle2023.pydata.org/cfp/talk/BRVLPA/
END:VEVENT
END:VCALENDAR