CEM NeurIPS 2018

This paper proposes a Contrastive Explanations method(CEM). They define a type of explanation where, given an input, the goal is to identify which features are both minimal and sufficient to justify its classification as well as which features are minimal and necessarily absent.

Pertinent Negatives (PN) refers to the importance of missing features in model predictions. It essentially represents a counterfactual explanation. $\mathbf{x}_0 + \boldsymbol{\delta}$ denotes the counterfactual instance. A pertinent negative can be identified by solving the following optimization problem:

$$ \min _{\boldsymbol{\delta} \in \mathcal{X} / \mathbf{x}_0} c \cdot f_\kappa^{\mathrm{neg}}\left(\mathbf{x}_0, \boldsymbol{\delta}\right)+\beta\|\boldsymbol{\delta}\|_1+\|\boldsymbol{\delta}\|_2^2+\gamma\left\|\mathbf{x}_0+\boldsymbol{\delta}-\mathrm{AE}\left(\mathbf{x}_0+\boldsymbol{\delta}\right)\right\|_2^2 $$$$ f_\kappa^{\text {neg }}\left(\mathbf{x}_0, \boldsymbol{\delta}\right)=\max \left\{\left[\operatorname{Pred}\left(\mathbf{x}_0+\boldsymbol{\delta}\right)\right]_{t_0}-\max _{i \neq t_0}\left[\operatorname{Pred}\left(\mathbf{x}_0+\boldsymbol{\delta}\right)\right]_i,-\kappa\right\} $$

where, $\kappa \ge 0$ is a confidence parameter used to control the separation between (\mathbf{x}0 + \boldsymbol{\delta}){t0}$ and $\max_{i \neq t_0}[Pred(\mathbf{x}0 + \boldsymbol{\delta}){t0}]$.

This optimization problem aim to find a $\boldsymbol{\delta}$ that is minized but can change the model’s prediction. In other words, $\boldsymbol{\delta}$ must not existing in this inputs to prevent changing model’s prediction.

Pertinent Positives (PP) refers to the critical features in the model’s original predictions. These features can be identified by solving the following optimizaiton problem. :

$$ \min _{\boldsymbol{\delta} \in \mathcal{X} \cap \mathbf{x}_0} c \cdot f_\kappa^{\mathrm{pos}}\left(\mathbf{x}_0, \boldsymbol{\delta}\right)+\beta\|\boldsymbol{\delta}\|_1+\|\boldsymbol{\delta}\|_2^2+\gamma\|\boldsymbol{\delta}-\mathrm{AE}(\boldsymbol{\delta})\|_2^2 $$$$ f_\kappa^{\text {pos }}\left(\mathbf{x}_0, \boldsymbol{\delta}\right)=\max \left\{\max _{i \neq t_0}[\operatorname{Pred}(\boldsymbol{\delta})]_i-[\operatorname{Pred}(\boldsymbol{\delta})]_{t_0},-\kappa\right\} $$

The objective of this optimization is to identify a relevant subset of features, aiming to retain the smallest feature set from the original feature set while ensuring that the model’s predictions remain consistent with the original predictions.