Main Idea
We devise the stack-wise alignment that is a minimization of
divergences between marginal feature measures on a stack-wise basis.
This alignment enables the model to learn feature invariance with an
ideal frequency of propagation. Indeed, instead of aligning every
block for each stack, single alignment for each stack mitigates
gradient vanishing/exploding issue via sparsely propagating loss while
preserving the interpretability of N-BEATS and ample
semantic coverage.
We adopt the Sinkhorn divergence which is an efficient
approximation for the classic optimal transport distances. This choice
is motivated by the substantial theoretical evidences of optimal
transport distances. Indeed, in the adversarial framework, optimal
transport distances have been essential for theoretical evidences and
calculation of divergences between push-forward measures induced by a
generator and a target measure.