Skip to content

About Instruction-as-Reasoning Ablation Experiment #11

Description

@RayneYael

Dear authors,

Thank you for your insightfull work! I have some questions regarding the ablation experiments in the paper. Table 4 presents two ablation experiments, distinguished by whether inference was performed during training. However, why do all benchmark scores in the last row of both experiments are same?

Does that mean, for SFT+RL, there's no difference whether we reasoning in training stages or not?

Any response would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions