This paper focuses on the performance of evaluation methods of Post-hoc Explanations. Current prevailing evalution methods mainly inlcudes two types: Feature Removal methods and Generation-based methods.

This paper hones in on the evaluation of the effectiveness of Post-hoc Explanation evaluation methods. Predominantly, current methodologies fall into two primary classifications: Feature Removal and Generation-based approaches.

Feature Removal operates on the principle of excising salient features identified through explanation mechanisms. The merit of such explanations is quantified by the variance in the model’s output pre and post-feature extrication. The larger the resultant decrement in model performance, which intimates the importance of the deleted features to the model’s functionality, the more adept the explanation method is deemed.

However, the reliability of Feature Removal metrics is often compromised by the out-of-distribution problem. To address this, Generation-based metrics have been introduced. These metrics utilize a generative model to craft new data instances that inherently lack the pivotal features, rather than eliminating them from existing instances. Nonetheless, this approach is not without its drawbacks, as generative models are prone to incorporating and perpetuating inherent data biases in these newly created instances.

Taking into account the aforementioned factors, this paper introduces a novel evaluation framework termed OAR (Out-of-Distribution-resistant Adversarial Robustness), designed to evaluate explanations with enhanced reliability.

Firstly, they define the quality of explanation $E$ as the difficulty of reversing the prediction by perturbing features not belonging to $E$.

$$ \begin{equation} \theta = \mathbb{E}_{F_{-E}}\left(f(F_{-E})_{c} - y_c\right) \end{equation} $$

where $F$ is the set of all features.

Second, the concept of reconstruction loss is employed as a metric to ascertain the extent of a sample’s deviation from its original distribution. By calculating the weighted average of the outputs from the generated instances, one can gauge the quality of the explanation. A smaller discrepancy between this average and the original output reflects a higher fidelity in the explanatory process.

References

Fang, J., Liu, W., Gao, Y., Liu, Z., Zhang, A., Wang, X., & He, X. (2024). Evaluating post-hoc explanations for graph neural networks via robustness analysis. Advances in Neural Information Processing Systems, 36.