$$ m_{\lambda, \beta} = \argmax_{m} \Phi(m \otimes x) - \lambda \|m\|_{1} - \beta \mathcal{S}(m). $$

They believe that the meaning of the trade-off of this formulation is unclear. In particular, choosing different $\lambda$ and $\beta$ will result in different masks without a clear way of comparing them.

$$ m_{a} = \argmax_{m: \|m\|_{1} = \alpha |\Omega|, m \in \mathcal{M}} \Phi(m \otimes x) $$

They think that the resulting mask is a function of the chosen area $a$ only.

$$ a^* = \min\{a: \Phi(m_a \otimes x) \ge \Phi_{0}\} $$

The mask $a^*$ is the extremum because a smaller $a$ would result in the perturbed input failing to make the model output exceed the lower limit $\Phi_0$.

$$ R_{a}(m) = \|vecsort(m) - r_a\|^{2} $$$$ m_a = \argmax_{m\in\mathcal{M}} \Phi(m \otimes x) - \lambda R_{a}(m) $$