Federated learning

Mike Lee Williams

Audience level:
Experienced

Description

Federated learning is a way to do machine learning when training data is partitioned between devices that are either unable or unwilling to share it because of privacy concerns or practical constraints. In this talk about use cases, explain a simple federated learning algorithm, show an example implementation in PyTorch, and demo Turbofan Tycoon, a web game based on the ideas.

Abstract

Federated learning is a way to do machine learning when training data is partitioned between nodes that are either unable or unwilling to share it.

The nodes can be embedded devices, smartphones or even legal entitities like companies or countries. They can be unable to share the data because of engineering constraints like bandwidth or power or legal bright lines such as HIPAA. Or they can be unwilling to share the data because of (very legitimate and topical!) concerns about the security, commercial exploitation and privacy of sensitive personal data.

Federated learning allows the nodes to collaborate to train a machine learning model, without needing to share direct access to their training data with each other or a central authority. Instead they each share partially trained models.

This talk will explain these ideas in more detail. I'll describe a specific instance of a federated learning algorithm (called federated averaging), and I'll explain the ways in which the real world full of malicious actors and distributed systems complicates the naive picture. I'll then talk about the research that is going on right now to harden security, reduce communication costs, and strengthen privacy guarantees.

The hope is that, with federated learning, we no longer need to give up our privacy in order to use life-saving, money-saving, helpful and fun machine learning models.

Subscribe to Receive PyData Updates

Subscribe