Python is widely used in Data Science. However, due to python’s single threaded nature, data scientists often encounter difficulties that would require techniques in multithreading or asynchronous design patterns. This talk will introduce a multitasking process using Python asyncio, and the structure for asynchronous task abstraction which could contribute to the design of an intuitive library.
Consider a general situation where a linear workflow involves tasks A, B, C, where C depends on the completion of A and B, but A, B are independent tasks. In this case, we would want to execute A and B simultaneously, and automatically trigger C when both A and B are done. This situation happens quite often in any general programming settings, and an abundant design pattern is available for dealing with this kind of situation. However, when it happens in machine learning, like in order to train a model (task C), we need to first process training data (task A) and process forecast data (task B), we are faced with 2 particular challenges:
1. Data Scientists are not so familiar with multithreading design, concurrent programming, ...etc.
2. Usually data scientists work in a single threaded environment (python, jupyter), rendering the possibility of multitasking even harder.
So we propose a general design to overcome the issue, a design that applies the abstraction of tasks which handles the blocking dependencies and asynchronously working.
Decanter AI Core SDK is a tool with an intuitive interface for users who want to take advantage of the Mobagel’s AutoML API. This tool, while easy to use, handles the complicated dependencies between asynchronous tasks under the hood, allowing its users to maximize their computation power. Meaning that a task will be blocked only if its prerequisite tasks haven't finished. Moreover, the task object is designed in a way to get the results easily without knowing the process which handles the dependencies and the existence of API. This design and structure can also be applied to other scenarios where efficient handling of large amounts of asynchronous tasks and their dependencies are critical.
Hi, I'm Hsiao-Shan Chen. I'm a Computer Science graduate from National Tsing Hua University and studied deep learning and computer vision as undergraduate research. I had worked in Mobagel as Software Engineering Intern, helped to build the SDK of their product, which helps handling the execution of multitasks in an efficient way.