The General Data Protection Regulation comes into force very soon and will ask many questions of those who work with data. In this talk, I'll give a quick overview of the parts of the regulation that everyone here should probably know, and then take a look at the challenges of complying when working with (machine learning) algorithms.
The new data protection laws provide European residents with a host of rights that help protect their data. A professional software developer or data scientist needs to be aware of these rights, because we are the only ones who are in a position to adequately defend them. Or, if that's not sufficiently persuasive: because we may wish to avoid the hefty penalties that the regulation threatens.
This talk will briefly summarise the rules, their terms, and their scope. I will quickly outline some of the new rights, especially the right to erasure, algorithm/decision-making transparency, and non-discrimination. Putting these rules into context, we can identify when the laws apply and when they don't. Finally, I'll mention some additional things that we ought to be aware of (breach notification, data erasure, sensitive personal data).
Three issues in particular deserve a technical discussion for a PyData audience: bias, transparency, pseudonymisation/erasure. The issue of bias is not a new one, and there are a number of existing approaches to identifying and removing bias that we should know about. I'll summarise these and point to where you can find more information. Transparency is a little less clear, because what exactly the law will (eventually) require of e.g. a machine learning programmer. Finally, I'll look at what we may be required to do to comply with pseudonymisation and erasure.