Question about the implementation and prompt for the 30% retention baseline

Hi, thanks for your great work and for releasing this project.

I am trying to reproduce the experimental results reported in the paper, especially the baseline setting with a **30% retention rate**. I have a few questions regarding the exact implementation details.

In the paper/report, the baseline under the 30% retention setting achieves certain results on ScreenSpot-Pro, ScreenSpot-v2, and OSWorld-G. In my reproduction, the results on **OSWorld-G** are generally consistent with the reported numbers. However, I observe a large gap on **ScreenSpot-Pro** and **ScreenSpot-v2** compared with the experimental report.

Could you please clarify the following details?

1. For the **30% retention baseline**, what is the exact implementation used?

   * How are the retained tokens/regions selected?
   * Is the 30% retention applied before or after any filtering, resizing, or preprocessing?
   * Are there any dataset-specific settings for ScreenSpot-Pro, ScreenSpot-v2, or OSWorld-G?

2. What is the exact **prompt template** used for this baseline?

   * Is it the same across ScreenSpot-Pro, ScreenSpot-v2, and OSWorld-G?
   * Are there any additional system prompts, grounding instructions, or formatting constraints?

3. Would it be possible to release the corresponding baseline code and prompt configuration?

   * This would be very helpful for reproducing the reported results and ensuring a fair comparison.

For reference, in my reproduction, OSWorld-G can roughly match the reported results, but ScreenSpot-Pro and ScreenSpot-v2 show a much larger discrepancy. Therefore, I suspect there may be some differences in the preprocessing, prompt format, or token-retention implementation.

Thanks again for your work. I would really appreciate any clarification or released configuration files that could help reproduce the 30% retention baseline.

<img width="764" height="147" alt="Image" src="https://github.com/user-attachments/assets/12b11694-aec5-430c-84b9-4ea188e09ff6" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about the implementation and prompt for the 30% retention baseline #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question about the implementation and prompt for the 30% retention baseline #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions