This paper adopts contrastive learning to generate the in-distribution perturbations. Assuming the input is $\mathbf{x}\in \mathbb{R}^{N \times T\times D}$, the specific process is as follows:

Step 1 Randomly select a counterfactual perturbation $\mathbf{x}^{r}_{i}=\varphi(\mathbf{x}_{i})$.

Step 2 divide all samples into two clusters: a positive cluster $\Omega^{+}$ and a negative cluster $\Omega^{-}$.

Step 3 Select the $K^{+}$ nearst positive samples from $\Omega^{+}$: $\{\mathbf{x}^{r+}_{i,k}\}^{K+}_{k=1}$

Step 4 Randomly select $K^{-}$ samples from the negative cluster: $\{\mathbf{x}^{r-}_{i,k}\}^{K-}_{k=1}$

Step 5 Compute the Manhattan distance between the anchor and negative samples: $\mathcal{D}_{ap} = \frac{1}{K^{-}} \sum_{k=1}^{K^{-}}\vert \boldsymbol{x}^{r}_{i} - \boldsymbol{x}_{i,k}^{r^{-}}\vert$. Similarly, compute the distance with positive samples: $\mathcal{D}_{ap} = \frac{1}{K^{+}} \sum_{k=1}^{K^{+}}\vert \boldsymbol{x}^{r}_{i} - \boldsymbol{x}_{i,k}^{r^{+}}\vert$.

Step 6 Minimize the triplet-based contrastive learning loss function: $\mathcal{L}(\boldsymbol{x}) = \max(0, \mathcal{D}_{an} - \mathcal{D}_{ap} - 1) + \Vert \boldsymbol{x}_{i}^{r}\Vert_{1}$

However, based on my understanding of the offical code, in each optimization step, instead of minimize the loss functions over all samples, they randomly select a single sample as the anchor from the batch in step 1, either arbitrarily or in an undefined manner.

References

ICLR 2024 Explaining Time Series via Contrastive and Locally Sparse Perturbations