Thank you for sharing your code.
I have one minor "issue," and the second is mostly a request and question.
The issue is you have a typo in the code in the README; you need to include some commas in your examples of A2CU and A3CU. For example:
recall_scores, prec_scores, f1_scores = a2cu.score(
references=references,
candidates=candidates,
generation_batch_size=2, # the batch size for ACU generation
matching_batch_size=16, # the batch size for ACU matching ## COMMA was missing
output_path=None, # the path to save the evaluation results ## COMMA was missing
recall_only=False, # whether to only compute the recall score ## COMMA was missing
acu_path=None # the path to save the generated ACUs
)
My question is regarding the output of this example, which gives precision=recall=f1 of 1/8, i.e.,
....
Recall score: 0.1250
Precision score: 0.1250
F1 score: 0.1250
Recall: 0.1250, Precision 0.1250, F1: 0.1250
The input in your example is
candidates, references = ["This is a test"], ["This is a test"]
This result was surprising to me. Would we not expect the answer to be 1 or fairly close to it? So, with a debugger, I exacted the acus being generated for the reference and found they were:
[['This
is a test.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.']]
Do I have an installation problem, or is this the expected answer?
Could you please add the expected output to both of your examples?
Thank you for sharing your code.
I have one minor "issue," and the second is mostly a request and question.
The issue is you have a typo in the code in the README; you need to include some commas in your examples of A2CU and A3CU. For example:
recall_scores, prec_scores, f1_scores = a2cu.score(
references=references,
candidates=candidates,
generation_batch_size=2, # the batch size for ACU generation
matching_batch_size=16, # the batch size for ACU matching ## COMMA was missing
output_path=None, # the path to save the evaluation results ## COMMA was missing
recall_only=False, # whether to only compute the recall score ## COMMA was missing
acu_path=None # the path to save the generated ACUs
)
My question is regarding the output of this example, which gives precision=recall=f1 of 1/8, i.e.,
....
Recall score: 0.1250
Precision score: 0.1250
F1 score: 0.1250
Recall: 0.1250, Precision 0.1250, F1: 0.1250
The input in your example is
candidates, references = ["This is a test"], ["This is a test"]
This result was surprising to me. Would we not expect the answer to be 1 or fairly close to it? So, with a debugger, I exacted the acus being generated for the reference and found they were:
[['This
is a test.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.',
'The narrator is talking about something.']]
Do I have an installation problem, or is this the expected answer?
Could you please add the expected output to both of your examples?