This talk walks through developing and deploying a machine learning pipeline at scale to predict flu onset in a production setting. Leveraging the open-source tools nbdev and Ploomber, we developed a workflow that allows us to produce maintainable, robust, production-ready machine learning pipelines directly from Jupyter.
Direct to individual infection monitoring programs are transforming how infections are identified, measured, and treated. Models built on permissioned wearable sensor data from devices such as Fitbit, Garmin, and Apple Watch, can be used to notify individuals of potential infections early on.
Overview:
Background and overview of the domain.
Data and analytics architecture overview.
Discuss the previous workflow and the challenges of taking a research model and creating a production pipeline. a. The complexity of translating between notebooks and production codebase b. Manual and inefficient tracking of pipeline status, outputs, and metadata c. Difficult to maintain repetitive, non-modular code
How open-source tools (Jupyter, nbdev, and Ploomber) helped us solve previous challenges. a. Notebook-based development promotes rapid development. b. Ploomber orchestrates workflows and facilitates the handoff between notebooks and production-ready code.
Intended Audience:
The talk is intended for data scientists and machine learning engineers interested in real-world examples of workflows that allow developing and deploying robust and maintainable machine learning pipelines.