The key idea of RISE to measure the importance of an image region is to obscure or ‘perturb’ it and observe how much this affects the black box decision.

RISE

Assume that $\mathbf{M}:\Lambda \rightarrow \{0, 1\}$ is a binary mask with distribution $\mathcal{D}$. They consider the confidence score of perturbed input as the random variable $f(\mathbf{I} \odot \mathbf{M})$. They define importance of pixel $\lambda$ as the expected score over all possible masks $\mathbf{M}$ over all possible masks $\mathbf{M}$ conditioned on the event that pixel $\lambda$ is observed, i.e. $\mathbf{M}({\lambda}) = 1$:

$$ S_{\mathbf{I}, f}(\lambda) = \mathbb{E}_{\mathbf{M}}\left[f(\mathbf{I}\odot \mathbf{M}) | \mathbf{M}(\lambda) = 1\right] $$

. The intuition behind this is that $f(I \cdot M)$ is high when pixels preserved by mask M are important.

Here is a reformulation.

$$ \begin{aligned} S_{\mathbf{I}, f}(\lambda) &= \mathbb{E}_{\mathbf{M}}\left[f(\mathbf{I}\odot \mathbf{M}) | \mathbf{M}(\lambda) = 1\right]\\ &= \sum_{m}f(\mathbf{I}\odot m) P[\mathbf{M} = m | \mathbf{M}(\lambda) = 1]\\ &= \frac{1}{P[\mathbf{M}(\lambda) = 1]} \sum_{m}f(\mathbf{I}\odot m) P[\mathbf{M} = m, \mathbf{M}(\lambda)=1] \end{aligned} $$

where,

$$ \begin{aligned} P[\mathbf{M} = m, \mathbf{M}(\lambda)=1] &= \begin{cases} 0, & \text{if } m(\lambda) =0, \\ P[\mathbf{M} = m], & \text{if } m(\lambda) = 1, \end{cases}\\ & = m(\lambda)P[\mathbf{M} = m] \end{aligned} $$

Substitute the expression of $P[\mathbf{M} = m, \mathbf{M}(\lambda)=1]$ with the one shown in the above equation.

$$ S_{\mathbf{I}, f}(\lambda) = \frac{1}{P[\mathbf{M}(\lambda) = 1]} \sum_{m}f(\mathbf{I}\odot m) m(\lambda)P[\mathbf{M} = m] $$

where, $P[\mathbf{M}(\lambda) = 1] = \mathbb{E}(\mathbf{M}[\lambda)]$.

$$ S_{\mathbf{I}, f}(\lambda) = \frac{1}{\mathbb{E}(\mathbf{M}[\lambda)]} \sum_{m}f(\mathbf{I}\odot m) m(\lambda)P[\mathbf{M} = m] $$

They use Monte Carlo sampling to empirically estimating the sum in the above equation.

$$ S_{\mathbf{I}, f}(\lambda) \stackrel{MC}{\approx} \frac{1}{\mathbb{E}[\mathbf{M}(\lambda)] \cdot N} \sum_{i=1}^{N}f(\mathbf{I}\odot m_{i}) m_{i}(\lambda) $$

where $N$ is the number of samples.

References

Petsiuk, V. (2018). Rise: Randomized Input Sampling for Explanation of black-box models. arXiv preprint arXiv:1806.07421.