The key idea of RISE to measure the importance of an image region is to obscure or ‘perturb’ it and observe how much this affects the black box decision.
RISE
$$ S_{\mathbf{I}, f}(\lambda) = \mathbb{E}_{\mathbf{M}}\left[f(\mathbf{I}\odot \mathbf{M}) | \mathbf{M}(\lambda) = 1\right] $$. The intuition behind this is that $f(I \cdot M)$ is high when pixels preserved by mask M are important.
$$ \begin{aligned} S_{\mathbf{I}, f}(\lambda) &= \mathbb{E}_{\mathbf{M}}\left[f(\mathbf{I}\odot \mathbf{M}) | \mathbf{M}(\lambda) = 1\right]\\ &= \sum_{m}f(\mathbf{I}\odot m) P[\mathbf{M} = m | \mathbf{M}(\lambda) = 1]\\ &= \frac{1}{P[\mathbf{M}(\lambda) = 1]} \sum_{m}f(\mathbf{I}\odot m) P[\mathbf{M} = m, \mathbf{M}(\lambda)=1] \end{aligned} $$$$ \begin{aligned} P[\mathbf{M} = m, \mathbf{M}(\lambda)=1] &= \begin{cases} 0, & \text{if } m(\lambda) =0, \\ P[\mathbf{M} = m], & \text{if } m(\lambda) = 1, \end{cases}\\ & = m(\lambda)P[\mathbf{M} = m] \end{aligned} $$$$ S_{\mathbf{I}, f}(\lambda) = \frac{1}{P[\mathbf{M}(\lambda) = 1]} \sum_{m}f(\mathbf{I}\odot m) m(\lambda)P[\mathbf{M} = m] $$where, $P[\mathbf{M}(\lambda) = 1] = \mathbb{E}(\mathbf{M}[\lambda)]$.
$$ S_{\mathbf{I}, f}(\lambda) = \frac{1}{\mathbb{E}(\mathbf{M}[\lambda)]} \sum_{m}f(\mathbf{I}\odot m) m(\lambda)P[\mathbf{M} = m] $$$$ S_{\mathbf{I}, f}(\lambda) \stackrel{MC}{\approx} \frac{1}{\mathbb{E}[\mathbf{M}(\lambda)] \cdot N} \sum_{i=1}^{N}f(\mathbf{I}\odot m_{i}) m_{i}(\lambda) $$where $N$ is the number of samples.
References
Petsiuk, V. (2018). Rise: Randomized Input Sampling for Explanation of black-box models. arXiv preprint arXiv:1806.07421.