where $b_{t,i}$ is usually generated based on the data distribution.
$$ L(M) = \sum_{c=1}^{C} f(x) \log f(x') $$References
Crabbé, J., & Van Der Schaar, M. (2021, July). Explaining time series predictions with dynamic masks. In International Conference on Machine Learning (pp. 2166-2177). PMLR.