About Instruction-as-Reasoning Ablation Experiment

Dear authors,

Thank you for your insightfull work! I have some questions regarding the ablation experiments in the paper. **Table 4** presents two ablation experiments, distinguished by whether inference was performed during training. However, why do all benchmark scores in the last row of both experiments are same?

Does that mean, for ` SFT+RL `, there's no difference whether we reasoning in training stages or not?

Any response would be appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About Instruction-as-Reasoning Ablation Experiment #11

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

About Instruction-as-Reasoning Ablation Experiment #11

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions