In this post, I want to talk a little bit about the explanation and interpretation of machine learning methods. This is cutting-edge research in machine learning nowadays. Just to be clear, we are not attempting to explain how a certain machine learning method (e.g. support vector machines) works, instead, we focus on explaining and interpreting what a specific method applied to a specific problem has learned. So, the big question that we want to tackle is: What has my algorithm learned?

First of all, I think we can all agree that it is indeed a very important research question. Ultimately, scientists and practitioners are not only interested in high prediction accuracies, they try to solve actual problems and need actual insights rather than prediction scores. If some ultra-fancy deep neural net gives the most accurate answer to a burning question, then you want to know why it does so. This information, unfortunately, remains hidden for most state-of-the-art methods (i.e. deep neural nets).

Explanation for 3.
Figure 1: This is an example output of an explanation method.

Before looking into solutions to this question, let us look into the question itself. The reason is that, while we can all agree on the importance of explaining decisions, it is actually not clear how such an explanation would look like. Many of you might have seen heat maps like the one shown in Figure 1. This is an example of an explanation for the prediction of ‘3’ with a classifier that learned to distinguish between ‘3’s and ‘8’s. Blue areas indicate exhibitory significance (would be good if there were pixels) and red areas inhibitory (would be good if there are no pixels) for prediction of class ‘3’. However, you still don’t know, e.g. if Figure 1 shows important areas that generally support the decision of this classifier or if these are the features the classifier is actually using for this specific sample.

Here, I will argue that there are various objectives possible depending on the intent and application. The above question is actually a conglomerate of (at least) four different questions. The differences may be subtle but I will explain each of them later in detail:

  1. What are the features that drive the decision of my algorithm in general
  2. Identify the features that contributed most to the score of an individual sample
  3. How much does feature \(s\) support the decisions of my algorithm in general (independent of the actual usage of this particular feature)
  4. How much does feature \(s\) support the score of an individual sample (independent of the actual usage of this particular feature)

The first two questions can be grouped into method-centric and latter two into problem-centric approaches. Both can be further grouped into instance-based explanations and model-based explanations. Moreover, problem-centric approaches can also be divided into methods that are feature-agnostic and feature-constrained explanation methods. The proposed taxonomy is given in Figure 2.

Figure 2: The proposed taxonomy of explanation methods.

To explain the subtle differences, we will employ a simple linear model \(f(\mathbf{x}) = \langle \mathbf{w}, \mathbf{x} \rangle\) in a specific learning setting. Let us assume we generated samples \(\{\mathbf{x_i}, y_i\}_i\), each consisting of four features (\(\mathbf{x_i} \in \mathbb{R}^4\)), from two distinct classes \(y_i \in \{-1,+1\}\). The first two features shall be strongly (linear) correlated to each other and also to their corresponding class label. The latter two will be just Gaussian noise. For adjusting the parameter vector \(\mathbf{w}\) we will make use of LARS to solve the corresponding LASSO problem with one non-zero coefficient. This will result in either the first or the second component activated and the rest set to zero (i.e. \(\mathbf{w} = (w_1, 0, 0, 0)^\top\) or \(\mathbf{w} = (0, w_2, 0, 0)^\top\)).

Method-centric approaches should now try to explain what this specific classifier has learned. Hence, want to know \(\mathbf{w}\) (which is of course trivial in this case) for the model-based explanation and \(\mathbf{w} \cdot \mathbf{x}\) (Hadamard product = component-wise multiplication) as instance-based explanation. Both show exactly what the classifier has learned and not which features are generally important to solve this problem. If \(w_1\) is active then those methods should only consider \(w_1\) and dismiss the second feature although it does carry a strong signal.

Problem-centric explanations, however, take also features into account that might not have been selected by the classifier but do nonetheless support the decision of a specific problem. In our setup that would be the indication of the first two features regardless of what the classifier is actually using. Model-based and instance-based explanations are otherwise similar to the above description. The distinction, however, between feature-agnostic and feature-constrained methods is important. Agnostic methods are able to also report the importance of features that have not been seen by the algorithm. For instance, we could ask how well a combination of the first two features would cope with the problem.

And last but not least, here are some examples:

  • Positional Oligomer Importance Matrices (POIMs) [1] are problem-centric, feature-dependent, model-based
  • Feature Importance Ranking Measure (FIRM) [2] is a problem-centric, feature-agnostic, model-based approach
  • Layer-wise Relevance Propagation (LRP) [3] is method-centric, instance-based.

All of the above work has been proposed by (ex-)members of our lab. There is, of course, other interesting work, e.g. LIME [4], VisualBackProp [5] and an interesting approach based on game-theoretic insights [6].

Our own contribution, the measure of feature importance (MFI) [7], is actually designed to be applicable to all of the above settings. It started as an extension to POIMs which was initiated by Sören Sonnenburg (the lead author of the POIM method). Later we moved on and extended FIRM as well (which itself is a generalization of POIMs). It has strong connections to the game-theoretic approach shown in [6] as well as other seemingly unconnected methods (e.g. kernel target alignments or Hilbert-Schmidt independence criterion to name a few). Hence, we are pretty convinced that our approach can be interesting to the scientific community and (hopefully) still be useful in practice. However, a nice, clean, comprehensive write-up is still in the works and will take some time.

You can find the corresponding ipython notebook here. Right now, it does look a bit rough but it will be improved over time. Anyway, it contains a descriptive example in the spirit of the above problem setting.

Update March 2020

Since I wrote this article, lots of new methods were developed and the field of explanation and interpretation (explanable AI /XAI) really started to grab the attention of many researchers. Hence, I thought instead of a simple update it seems to be necessary to have some new posts for this topic. So, stay tuned for some follow-ups.

If you find this post interesting, please share it and leave a comment below. You can, of course, also support me by donating or becoming a monthly supporter which will come with some more benefits such as exclusive content. If you have any questions, please contact me or leave a comment below.


[1] S. Sonnenburg, A. Zien, P. Philips, and G. Rätsch, “POIMs: Positional oligomer importance matrices – Understanding support vector machine-based signal detectors,” Bioinformatics, vol. 24, no. 13, pp. 6–14, 2008.

[2] A. Zien, N. Krämer, S. Sonnenburg, and G. Rätsch, “The Feature Importance Ranking Measure,” in ECML PKDD, 2009, no. 5782, pp. 694–709.

[3] S. Bach, A. Binder, G. Montavon, F. Klauschen, K. R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PLoS One, vol. 10, no. 7, 2015.

[4] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why Should I Trust You? Explaining the Predictions of Any Classifier,” in KDD, 2016.

[5] M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, U. Muller, and K. Zieba, “VisualBackProp: efficient visualization of CNNs,” Arxiv, 2016.

[6] E. Strumbelj and I. Kononenko, “An Efficient Explanation of Individual Classifications using Game Theory,” J. Mach. Learn. Res., vol. 11, pp. 1–18, 2010.

[7] M. M.-C. Vidovic, N. Görnitz, K.-R. Müller, and M. Kloft, “Feature Importance Measure for Non-linear Learning Algorithms,” in NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems, 2016.