This paper proposes the Integrated Directional Gradients (IDG) to computing feature group attribution. One important thing is to find the family of meaningful feature subsets which is defined by the domain related methods, such as the constituency parse tree for NLP.

$$ z_i^S = \begin{cases} x_i - b_i & \text {if} a_i \in S\\ 0 & \text{ otherwise} \end{cases} $$$$ \nabla_{S}f(x) = \nabla f(x) \cdot \hat{z}^{S} \quad \hat{z}^{S} = \frac{z^{S}}{\Vert z^{S} \Vert} $$$$ \text{IDG}(S) = \int_{\alpha = 0}^1 \nabla_{S}f(b + \alpha (x - b)) d \alpha $$

The dividend $d(S)$ of the feature subset $S$ is computed by normalizing the absolute value of $IDG(S)$ over all meaningful subsets.

$$ d(S) = \begin{cases} \frac{|\text{IDG}(S)|}{\sum_{S\in M}|\text{IDG}(S)} & \text{if} S \in M \\ 0 & otherwise \end{cases} $$

Finally, the importance $v(S)$ is defined as the sum of all meaningful subsets contained in $S$.

$$ v(S) = \sum_{T \in \{T| (T\subseteq S) \wedge (S\in M)\}} d(T) $$

I think IDG is suitable for data where features can naturally cluster into multiple groups. It is worth noting that I don’t find a comparison between IDG and other methods in the paper. I think this represents a novel writing approach that could be referenced in future work.

References

Sikdar, Sandipan, Parantapa Bhattacharya, and Kieran Heese. “Integrated directional gradients: Feature interaction attribution for neural NLP models.” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 865-878. 2021.