When explaining large black-box model with a large number of parameters, this paper classify existing feature attribution methods into three groups:

(1) Original feature attribution methods. They usually be prohibitively expensive for the large models due to each individual explanations requires a significant number of inferences.

(2) Monte Carlo methods. They employ Monte Carlo methods to approximate explanations with fewer computations.

(3) Amortized methods. They train a separate model to mimic the output of original feature attribution methods.

This paper proposes Selective explainer to bridge Monte Carlos and Amortized methods.

$$ \operatorname{SE}(\boldsymbol{x}, \boldsymbol{y}) \triangleq \begin{cases}\operatorname{Amor}(\boldsymbol{x}, \boldsymbol{y}) & , \text { if } \tau_\alpha(\boldsymbol{x})=1 \\ \lambda_h(\boldsymbol{x}) \operatorname{Amor}(\boldsymbol{x}, \boldsymbol{y})+\left(1-\lambda_h(\boldsymbol{x})\right) \mathrm{MC}^n(\boldsymbol{x}, \boldsymbol{y}) & , \text { if } \tau_\alpha(\boldsymbol{x})=0\end{cases}, $$

where $\lambda_h$ is a combination function and $\tau_{\alpha}$ is a selection function.

Notably, the selection function $\tau_{\alpha}$ is based on a uncertainty function $S_{h}(\mathbf{x})$

$$ \tau_{\alpha}(\mathbf{x}) \triangleq \begin{cases} 1 & \text{if } s_{h}(\mathbf{x})\leq t_{\alpha} \text{ (high-quality explanations)} \\ 0 & \text{if } s_{h}(\mathbf{x}) > t_{\alpha} \text{ (low-quality explanations)}\end{cases} $$

This paper proposes two uncertainty metrics (It is tailored to high-dimensional outputs):

$$ s_{h}(x) \triangleq \frac{1}{dk} \sum_{i=1}^{d} \operatorname{Var} \left(\operatorname{Amor}^1(\mathbf{x})_i, \cdots, \operatorname{Amor}^1(\mathbf{x})_i \right). $$

where, $\operatorname{Var}(a_1, \cdots, a_k)$ is the variance of the sample $\{a_1, \cdots, a_k\}$, k is the number of $\textbf{Amor}$ explainers trained with different random seed.

(2) $\textbf{Learned Uncertainty}$ uses data to predict the amortized explainer uncertainty.

$$ S^{\text{learn}}_{h} \in \argmin_{s \in \mathcal{F}} \sum_{(\mathbf{x}, \mathbf{y}) \in \mathcal{D}_{\text{train}}} |s(\mathbf{x}) - \mathcal{l}(\operatorname{Amor}(\textbf{x};\textbf{y}), MC^{n}(\textbf{x};\textbf{y}))|^2. $$

References

Paes LM, Wei D, Calmon FP. Selective Explanations. arXiv preprint arXiv:2405.19562. 2024 May 29.