There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government is likely reproducing or even amplifying existing prejudices and inequalities. Even when an analyst wants to pursue fairness and accuracy, it is easy to unintentionally create discriminatory code. I will discuss how to be good and avoid being part of the problem.
This talk will be a mix of discussing current events and the latest machine learning bloopers and presenting concrete recommendations and existing tools for being good rather than being evil as a data scientist. In particular, the audience can expect practical guidance (and preferably audience input on):
Data discovery a. Examples of how ‘bad’ or incomplete data sets can lead to discriminatory models b. How to examine your input data and balance your input data before inputting into an analysis pipeline
Data processing a. Examples of how data processing has resulted in discriminatory models b. How to examine your preprocessing pipeline to prevent discriminatory inputs c. Examples of how data processing has resulted in privacy-violating models d. How to examine your process for privacy leaks
Modeling a. Examples of how choice of model can lead to discriminatory results b. Examples of how models can be designed to be more or less vulnerable to discriminatory input data c. How to test your model & examine final parameters/fits for discriminatory behavior for a variety of common model families
Auditing your model a. Examples of how even models following processes above may still yield discriminatory behavior b. Auditing your model as a blackbox with existing Python language solutions
Research frontiers a. Updates on how computer scientists and sociologists are developing new methods to avoid discriminatory and privacy-violating models. Several newly published papers will be presented to give audience a sense of the breadth and current state of this active area of research