Skip to content

SailorFog-QA Pipeline? #235

@chinhh-ai

Description

@chinhh-ai

Hello authors,

Thank you for releasing the WebAgent / WebSailor project and the SailorFog-QA dataset.
I have been reading the paper and exploring the repository, but I am a bit confused about the data generation pipeline.

I can find the SailorFog-QA dataset (e.g., sailorfog-QA.jsonl), however I cannot locate:

  • the runnable pipeline / scripts to generate SailorFog-QA from raw documents or web data, or
  • an entry point or instructions to reproduce the SailorFog-QA dataset generation process.

Could you please clarify:

  1. Is the SailorFog-QA dataset generated by an internal pipeline that is not fully released?
  2. If the pipeline is available, could you point to the exact directory / scripts / command to run it?
  3. If it is not released, are there any plans to open-source the data generation pipeline, or provide a simplified version?

This would be extremely helpful for researchers who want to reproduce or extend SailorFog-QA for their own datasets.

Thank you very much for your time and for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions