Sunday 12:00–12:45 in D105 Audimax

Data Analytics and the new European Privacy Legislation

Amit Steinberg

Audience level:
Intermediate

Description

They upcoming privacy legislation of the EU will radically change the way we do data analytics, restricting the processing of personally identifiable data. We will go through common data processing scenarios and learn how the new legislation will affect them, offering practical solutions.

Abstract

The EU General Data Protection Regulation (GDPR) is a stringent privacy regulation coming into effect in May 2018, along with the new planned ePrivacy Regulation. The GDPR provides for strong sanctions, with fines up to 20M Euro, or 4% of the yearly global turnover (whichever is higher) for companies in breach. It applies to any EU company, and any company processing the data of EU residents. It also broadens "personal data" to include anything generated by people behavior or referring to an identified person. Pseudonymised data will remain personal data under the GDPR.

The GDPR turns data privacy from a simple legal problem to a core business issue about data collection, management, processing, and analysis. Doing any kind of data analytics, especially in B2C contexts, will become challenging. For example, marketing analytics based on people tracking may become technically difficult and methodologically almost unfeasible. Automated scoring and decision-making will become strongly regulated. Simply collecting "big data", figuring out later what to do with it will be a thing of the past - many companies will, therefore have to radically change their way of working with data: either delete a lot of their data or genuinely anonymize it. Once data is anonymized, it is no longer subject to the GDPR, but achieving high-utility anonymization is a difficult task.

Privacy by design and by default will become a pre-requisite for compliant personal data analytics. There has been a lot of progress over the last decade on data privacy and anonymization techniques. It is now possible to build recommender systems, classification models and almost everything imaginable with anonymized data, but maintaining data utility requires careful planning and optimization. This is especially tough for web/mobile/IoT data streams: high-dimensional data from multiple sources with spatial and temporal attributes. Most use cases are doable in compliant ways - state-of-the-art solutions will be reviewed.

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now