The main contributions of this paper are twofold:(1) investigating the Out-of-Distribution (OOD) problem in counterfactual inputs, and (2) proposing a Parallel Local Search (PLS) method for generating explanations.

Out-of-Distribution Problem

The possible causes of the OOD problem in FI explanations are shown in the following figure. alt text Even on in-distribution data, neural networks are sensitive to random parameter initialization, data ordering, and hyperparameters. Therefore, neural networks are also influenced by these factors when processing OOD data.

To address the OOD problem, they propose Counterfactual Training to align the training and testing distributions. The core step of Counterfactual Training involves training a neural network using random explanations that remove most input tokens.

Evaluating OOD performance for different Replace functions

Robustness is measured by model accuracy, and the evaluation steps are as follows:

  • Evaluate five different Replace functions on the same explanation.
  • Compute the accuracy change for each removal proportion of features.
  • For each removal proportion of features, the smaller the accuracy change, the better the Replace function performs in addressing the OOD problem.

Result

alt text (1) The Attention Mask and Mask Token functions are the two most effective methods. (2) Counterfactual training mitigates the OOD problem for counterfactual inputs.

Search methods for explanation

They propose a novel search method, Parallel Local Search (PLS), to explain feature importance.

References

Hase, Peter, Harry Xie, and Mohit Bansal. “The out-of-distribution problem in explainability and search methods for feature importance explanations.” Advances in neural information processing systems 34 (2021): 3650-3666.