What is Summit?

Understanding how neural networks make predictions remains a fundamental challenge. Existing work on interpreting neural network predictions for images often focuses on explaining predictions for single images or neurons, yet predictions are computed from millions of weights optimized over millions of images—such explanations can easily miss a bigger picture.

We present Summit, an interactive visualization that scalably summarizes what features a deep learning model has learned and how those features interact to make predictions.

How does it work?

Summit introduces two new scalable summarization techniques that aggregate activations and neuron-influences to create attribution graphs: a class-specific visualization that simultaneously highlights what features a neural network detects and how they are related.

An illustration of how Summit takes thousands of images for a given class, e.g., images from white wolf class, computes their top activations and attributions, and combines them to form an attribution graph that shows how lower-level features ("legs") contribute to higher-level ones ("white fur"), and ultimately the final prediction.

By using a graph representation, we can leverage the abundant research in graph algorithms to extract attribution graphs from a network that show neuron relationships and substructures within the entire neural network that contribute to a model’s outcomes.

Scaling neural network interpretability

Summit scales to large data and leverages neural network feature visualization and dataset examples to help distill large, complex neural network models into compact, interactive visualizations.

Above we demonstrate Summit by visualizing the attributions graphs for each of the 1,000 classes of InceptionV1 trained on ImageNet.

In our paper, we present neural network exploration scenarios where Summit helps us discover multiple surprising insights into InceptionV1's learned representations. Below we describe two such examples.

Example I: Unexpected semantics within a class

Can model developers be confident that their network has learned what they think it has learned? We can start to answer questions like these with attribution graphs. For example, consider the tench class (a type of yellow-brown fish). Starting from the first layer, we notice the attribution graph for tench does not contain any fish or water features, but instead shows many "finger," "hand," and "people" detectors. It is not until a middle layer, mixed4d, that the first fish and scale detectors are seen; however, even these detectors focus solely on the body of the fish (there is no fish eye, face, or fin detectors).

Inspecting dataset examples reveals many image patches where we see people's fingers holding fish, presumably after catching them. This prompted us to inspect the raw data for the tench class, where indeed, most of the images are of a person holding the fish. We conclude that, unexpectedly, the network uses people detectors and in combination with brown fish body and scale detectors to represent the tench class. Generally, we would not expect "people" as an essential feature for classifying fish.

In this example, the model accurately classifies images of tench (a yellow-brown fish). However, Summit reveals surprising associations in the network (e.g., using parts of people) that contribute to its final outcome: the tench prediction is dependent on an intermediate "hands holding fish" feature, which is influenced by lower-level features like "scales", "person", and "fish".

This surprising finding motivated us to find another class of fish that people do not normally hold to compare against, such as a lionfish (due to their venomous spiky fin rays). Visualizing the lionfish attribution graph confirms our suspicion: there are no people object detectors in its attribution graph. However, we discover yet another unexpected combination of features: while there are few fish-part detectors there are many texture features, e.g., stripes and quills. It is not until the final layers of the network where a highly activated channel appears to detect an orange fish in water, which uses the stripe and quill detectors.

An example substructure from the lionfish attribution graph that shows unexpected texture features, like "quills" and "stripes," influencing top activated channels for a final layer's "orange fish" feature (some lionfish are reddish-orange, and have white fin rays).

Therefore we deduce that the lionfish class is composed of a striped body in the water with long, thin quills. Whereas the tench had unexpected people features, the lionfish lacked fish features. Regardless, findings such as these can help people more confidently deploy models when they know what composition of features results in a specific prediction.

Example II: Discriminable features in similar classes

Since neural networks are loosely inspired by the human brain, in the broader machine learning literature there is interest to understand if decision rationale in neural networks is similar to that of people. With attribution graphs, we begin to investigate this question by comparing classes throughout layers of a network.

For example, consider the black bear and brown bear classes. A person would likely say that color is the discriminating difference between these animal classes. By taking the intersection of their attribution graphs, we can see what features are shared between the classes, as well as any discriminable features and connections.

With attribution graphs, we can compare classes throughout layers of a network. Here we compare two similar classes: black bear and brown bear. From the intersection of their attribution graphs, we see both classes share features related to bear-ness, but diverge towards the end of the network using fur color and face color as discriminable features. This feature discrimination aligns with how humans might classify bears.

In the figure above, we see in earlier layers (mixed4c) that both black bear and brown bear share many features, but as we move towards the output, we see multiple diverging paths and channels that distinguish features for each class. Ultimately, we see individual black and brown fur and bear face detectors, while some channels represent general bear-ness. Therefore, it appears the network classifies black bear and brown bear based on color, which may be the primary feature humans may classify by. This is only one example, and it is likely that these discriminable features do not always align with what we would expect; however, attribution graphs give us a mechanism to test hypotheses like these.

Summit features

Check out the following video for a quick look at Summit's features.

Broader impact for visualization in AI

Our work joins a growing body of open-access research that aims to use interactive visualization to explain complex inner workings of modern machine learning techniques. We believe our summarization approach that builds entire class representations is an important step for developing higher-level explanations for neural networks. We hope our work will inspire deeper engagement from both the information visualization and machine learning communities to further develop human-centered tools for artificial intelligence.


Summit was created by Fred Hohman, Haekyu Park, Caleb Robinson, and Polo Chau at Georgia Tech. We also thank Nilaksh Das and the Georgia Tech Visualization Lab for their support and constructive feedback. This work is supported by a NASA Space Technology Research Fellowship and NSF grants IIS-1563816, CNS-1704701, and TWC-1526254.

Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations
Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng (Polo) Chau.
IEEE Transactions on Visualization and Computer Graphics (TVCG, Proc. VAST'19). 2020.