Over 10 million people use South Africa’s national COVID-19 WhatsApp hotline, but the service lacked support for users who ask free-form questions. This talk centers on how we built a scalable natural language (NLP) solution that enables users to ask their own questions about vaccines. We’ll discuss various choices behind our model, and how we’ve improved its overall performance and efficiency.
Over 10 million people use South Africa’s national COVID-19 WhatsApp hotline, but the service previously lacked support for users who ask free-form questions. This talk centers on how we built a scalable natural language processing (NLP) service, so that users can now ask their own questions about COVID-19 vaccines (e.g., “Is the COVID shot safe if I’m having a child?” or “Which vaccine brand is the best?”).
While some background in algorithms and machine learning is helpful, we don’t expect audience members to have experience with NLP. Instead, this talk is geared at practitioners (including less experienced ones) interested in transferring machine learning into specialized settings. We’ll mention some Python libraries like cProfile and scikit-learn, but won’t focus on technical details. Rather, we’ll discuss how real-world context and constraints translate into models and algorithms.
For example, we’ll discuss how constraints like “Absolutely no wrong information can be sent” affects model design and implementation, and how diving into the math behind out-of-the-box ML models improved our computational efficiency ten-fold. With this, even non-technical audience members should find this talk engaging and informative (and technical ones will enjoy the problem-solving rounds!).
We’ll begin by introducing the hotline and implementation context, emphasizing the need for a specialized model given computing and maintenance limitations. Then, we’ll consider the variety of possible natural language models in existence, and why various aspects of our use case merited a specific choice. We want this talk to be as interactive as possible, so audience members will have a chance to advocate for the pros/cons of various models.
Next, we’ll briefly run through our model, and the theory behind why it’d work. We’ll then narrow in on two important improvements we’ve made, in language contextualization and improved computational efficiency. This section will be interactive: audience members will have a chance to “work on” improvements to our algorithms, after which we’ll explain the actual changes we implemented.