Hello Kipf, I find there is a discrepancy between the loss mentioned in the paper.
According to Eq(5) in paper, for negative samples, you calculate the Euclidean distance between negative state sample at timestep t and state at timestep t+1.
However, in the code below, state and neg_state are both at timestep t.
self.neg_loss = torch.max(
zeros, self.hinge - self.energy(
state, action, neg_state, no_trans=True)).mean()
I noticed that the same question was also asked here.
I want to know if this is a bug ? Does the discrepancy affect the final performance ?