Fred Hohman
/ Ph.D. Candidate at GT

Massif: Interactive Interpretation of Adversarial Attacks on Deep Learning

Nilaksh Das, Haekyu Park, Zijie J. Wang, Fred Hohman, Robert Firstman, Emily Rogers, Duen Horng (Polo) Chau

The Massif interface. A user Hailey is studying a targeted version of the Fast Gradient Method (FGM) attack performed on the InceptionV1 neural network model. Using the control panel (A), she selects "giant panda" as the benign class and "armadillo" as the attack target class. Massif generates an attribution graph of the model (B), which shows Hailey the neurons within the network that are suppressed in the attacked images (B1, colored blue on the left), shared by both benign and attacked images (B2, colored purple in the middle), and emphasized only in the attacked images (B3, colored red on the right). Each neuron is represented by a node and its feature visualization (C). Hovering over any neuron displays example dataset patches that maximally activate the neuron, providing stronger evidence for what a neuron has learned to detect. Hovering over a neuron also highlights its most influential connections from the previous layer (D), allowing Hailey to determine where in the network the prediction diverges from the benign class to the attacked class.

Abstract

Deep neural networks (DNNs) are increasingly powering high-stakes applications such as autonomous cars and healthcare; however, DNNs are often treated as “black boxes” in such applications. Recent research has also revealed that DNNs are highly vulnerable to adversarial attacks, raising serious concerns over deploying DNNs in the real world. To overcome these deficiencies, we are developing Massif, an interactive tool for deciphering adversarial attacks. Massif identifies and interactively visualizes neurons and their connections inside a DNN that are strongly activated or suppressed by an adversarial attack. Massif provides both a high-level, interpretable overview of the effect of an attack on a DNN, and a low-level, detailed description of the affected neurons. These tightly coupled views in Massif help people better understand which input features are most vulnerable or important for correct predictions.

Citation

Massif: Interactive Interpretation of Adversarial Attacks on Deep Learning
Nilaksh Das, Haekyu Park, Zijie J. Wang, Fred Hohman, Robert Firstman, Emily Rogers, Duen Horng (Polo) Chau
arXiv:2001.07769. 2020.
Project PDF

BibTeX


@article{das2020massif,
   title={Massif: Interactive Interpretation of Adversarial Attacks on Deep Learning},
   author={Das, Nilaksh and Park, Haekyu and Wang, Zijie J. and Hohman, Fred and Firstman, Robert and Rogers, Emily and Chau, Duen Horng (Polo)},
   journal={arXiv preprint arXiv:2001.07769},
   year={2020}
 }