Dear Author, When constructing the input, does shifting input_ids_target to the right break the causal dependency chain among the input tokens? Or is the causal relationship already implicitly captured in the hidden states by default? Thank you.
Dear Author,
When constructing the input, does shifting input_ids_target to the right break the causal dependency chain among the input tokens?
Or is the causal relationship already implicitly captured in the hidden states by default?
Thank you.