This paper proposes the first general, model-agnostic approach to provide explanations for the predictions of any GNN-based model on any graph-based machine learning task.

GNNExplainer is a perturbed-based explanation method. Its goal is to identify a subgraph $G_S$ of $G$ and the associated features $X_S$ that are important for the GNN’s prediction $\hat{y}$.

The loss function is defined as follows:

$$ \max_{G_S} \text{MI} (Y, (G_S, X_S)) = H(Y) - H(Y|G=G_S, X=X_S) $$

where $Y$ represents the predicted label distribution. Notably, the entropy term $H(Y)$ is constant. Therefore, maximizing the mutual information between $Y$ and $(G_S, X_S)$ is equivalent to minimizing the conditional entropy $H(Y|G=G_S, X=X_S)$:

$$ \begin{aligned} \max_{G_S} \text{MI} (Y, (G_S, X_S)) &= \min H(Y|G=G_S, X=X_S) \\ &=\min - \mathbb{E}_{Y|G_S, X_S}\left[\log P_{\Phi}(Y|G=G_S, X=X_S)\right] \\ \end{aligned} $$

Based on the convexity assumption, Jensen’s inequality provides the following upper bound:

$$ \begin{aligned} \min_{\mathcal{G}} H(Y|G=\mathbb{E}_{\mathcal{G}}[G_S], X=X_S) \end{aligned} $$

Although the convexity assumption does not always hold, the experimental results show that this method often produces high-quality explanations.

References

NeurIPS 2019 GNNExplainer: Generating Explanations for Graph Neural Networks