This paper proposes a fast saliency deletion method that can be applied to any differentiable image classifier. Their method provides explanations for model output by learning a mask $M$, where each element in $M$ represents the importance of elements in the input. The perturbation method is defined as follows:
$$ \Phi(X, M) = X \odot M + A \odot (1 - M), $$where $X$ is the original input, and $A$ is a reference input, which is usually a highly blurred version of $X$.
To obtain a smooth mask, the total variation (TV) regularizer is defined as follows:
$$ TV(M) = \sum_{i,j}(M_{ij} - M_{ij+1})^2 + \sum_{i,j}(M_{ij} - M_{i+1j})^2 $$Given class $c$ of interest, and input image $X$, to find a saliency map $M$ for class $c$, the objective function $L$ is given by:
$$ L(M) = \lambda_1 TV(M) + \lambda_2 AV(M) - \log (f_{c}(\Phi(X, M))) + \lambda_3 f_c(\Phi(X, 1- M))^{\lambda_4} $$The third term makes sure that the classifier is able to recognize the selected class from the preserved region. It is worth noting that the last term ensures that the probability of the selected class after the salient region is removed, is low. Setting $\lambda_4$ to a value smaller than 1 helps reduce this probability to very small values.