This paper is an improvement on meaningful perturbation(2017 Interpretable explanations of black boxes by meaningful perturbation). They reformulate the optimization problem of meaningful perturbation as follows:

$$ m_{\lambda, \beta} = \argmax_{m} \Phi(m \otimes x) - \lambda \|m\|_{1} - \beta \mathcal{S}(m). $$

They believe that the meaning of the trade-off of this formulation is unclear. In particular, choosing different $\lambda$ and $\beta$ will result in different masks without a clear way of comparing them.

To remove the balancing issues, they constrain the area of the mask to a fixed value (as a fraction $a|\Omega|$ of the input image area):

$$ m_{a} = \argmax_{m: \|m\|_{1} = \alpha |\Omega|, m \in \mathcal{M}} \Phi(m \otimes x) $$

They think that the resulting mask is a function of the chosen area $a$ only.

Consider a lower bound $\Phi_0$ on the model’s output (for example we may set $\Phi_0 = \tau \Phi(x)$ to be a fraction $\tau$ of the model’s output on the unperturbed images) They seek the smallest mask such that the model’s output reaches at least $\Phi_{0}$. This is equivalent to iterating over parameter $a$ to find the smallest $a$ that meets the requirement.

$$ a^* = \min\{a: \Phi(m_a \otimes x) \ge \Phi_{0}\} $$

The mask $a^*$ is the extremum because a smaller $a$ would result in the perturbed input failing to make the model output exceed the lower limit $\Phi_0$.

In practice, it is very difficult to achieve the above constraint. To address this issue, they proposed a gradient descent-based method. They define the $vecsort(m)$ operation, which vectorizes $m$ and then sorts it in non-decreasing order. If a mask $m$ satisfies the constraint exactly, then the output of $vecsort(m)$ is a vector $r_{a} \in [0, 1]^{\Omega}$ consisting of $(1-a)\Omega$ zeros followed by $a|\Omega|$ ones. Based on this, they proposed the following regularization term:

$$ R_{a}(m) = \|vecsort(m) - r_a\|^{2} $$

Combining the above statements, they formulated the final loss function as follows:

$$ m_a = \argmax_{m\in\mathcal{M}} \Phi(m \otimes x) - \lambda R_{a}(m) $$