In machine.py, we do something like this, so that only scores for non-special tokens are bubbled up:
for item in zipped:
output_ids, scores, sequence_score, attentions = cast(
Tuple[torch.Tensor, torch.Tensor, Optional[float], Optional[torch.Tensor]], item
)
output_tokens: List[str] = []
output_indices: List[int] = []
for i, output_id in enumerate(output_ids):
id = cast(int, output_id.item())
if id not in all_special_ids:
output_tokens.append(self.tokenizer.convert_ids_to_tokens(id))
output_indices.append(i)
scores = scores[output_indices]
In silnlp, we do something similar downstream in hugging_face_config.py:translate().
However, we grab the sequence_scores directly from the model outputs and these sequence scores seem to include the BOS token score which is close to 0. This will presumably bias the sequence score slightly so that shorter output sequences would have a score closer to zero.
We should confirm that these special token scores are being included in the sequence score and then update silnlp accordingly if they are (or maybe even consider submitting an issue in transformers).
In machine.py, we do something like this, so that only scores for non-special tokens are bubbled up:
In silnlp, we do something similar downstream in
hugging_face_config.py:translate().However, we grab the
sequence_scoresdirectly from the model outputs and these sequence scores seem to include the BOS token score which is close to 0. This will presumably bias the sequence score slightly so that shorter output sequences would have a score closer to zero.We should confirm that these special token scores are being included in the sequence score and then update silnlp accordingly if they are (or maybe even consider submitting an issue in transformers).