Skip to content

Question about the implementation and prompt for the 30% retention baseline #3

Description

@ToBeReality

Hi, thanks for your great work and for releasing this project.

I am trying to reproduce the experimental results reported in the paper, especially the baseline setting with a 30% retention rate. I have a few questions regarding the exact implementation details.

In the paper/report, the baseline under the 30% retention setting achieves certain results on ScreenSpot-Pro, ScreenSpot-v2, and OSWorld-G. In my reproduction, the results on OSWorld-G are generally consistent with the reported numbers. However, I observe a large gap on ScreenSpot-Pro and ScreenSpot-v2 compared with the experimental report.

Could you please clarify the following details?

  1. For the 30% retention baseline, what is the exact implementation used?

    • How are the retained tokens/regions selected?
    • Is the 30% retention applied before or after any filtering, resizing, or preprocessing?
    • Are there any dataset-specific settings for ScreenSpot-Pro, ScreenSpot-v2, or OSWorld-G?
  2. What is the exact prompt template used for this baseline?

    • Is it the same across ScreenSpot-Pro, ScreenSpot-v2, and OSWorld-G?
    • Are there any additional system prompts, grounding instructions, or formatting constraints?
  3. Would it be possible to release the corresponding baseline code and prompt configuration?

    • This would be very helpful for reproducing the reported results and ensuring a fair comparison.

For reference, in my reproduction, OSWorld-G can roughly match the reported results, but ScreenSpot-Pro and ScreenSpot-v2 show a much larger discrepancy. Therefore, I suspect there may be some differences in the preprocessing, prompt format, or token-retention implementation.

Thanks again for your work. I would really appreciate any clarification or released configuration files that could help reproduce the 30% retention baseline.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions