Fred Hohman
/ CSE Ph.D. Student at GT

Discovery of Intersectional Bias in Machine Learning Using Automatic Subgroup Generation

Angel Cabrera, Minsuk Kahng, Fred Hohman, Jamie Morgenstern, Duen Horng (Polo) Chau

Using our technique on the UCI Adult Dataset we (A) Cluster instances into subgroups, then (B) Calculate subgroup feature entropy to find dominant features, and lastly (C) Investigate similar subgroups to discover value and performance differences.

Abstract

As machine learning is applied to data about people, it is crucial to understand how learned models treat different demographic groups. Many factors, including what training data and class of models are used, can encode biased behavior into learned outcomes. These biases are often small when considering a single feature (e.g., sex or race) in isolation, but appear more blatantly at the intersection of multiple features. We present our ongoing work of designing automatic techniques and interactive tools to help users discover subgroups of data instances on which a model underperforms. Using a bottom-up clustering technique for subgroup generation, users can quickly find areas of a dataset in which their models are encoding bias. Our work presents some of the first user-focused, interactive methods for discovering bias in machine learning models.

Citation

Discovery of Intersectional Bias in Machine Learning Using Automatic Subgroup Generation
Angel Cabrera, Minsuk Kahng, Fred Hohman, Jamie Morgenstern, Duen Horng (Polo) Chau
Debugging Machine Learning Models Workshop at ICLR (Debug ML). New Orleans, Louisiana, USA, 2019.
Project PDF BibTeX

BibTeX


@article{cabrera2019fdiscovery,
  title={Discovery of Intersectional Bias in Machine Learning Using Automatic Subgroup Generation},
  author={Cabrera, {\'A}ngel and Kahng, Minsuk and Hohman, Fred and Morgenstern, Jamie and Chau, Duen Horng},
  journal={Debugging Machine Learning Models Workshop (Debug ML) at ICLR},
  year={2019}
}