Issue Content
Hello, and thank you for releasing this great project and open-sourcing the code and models. I really appreciate the effort from the team—it is very helpful for the community.
I encountered an issue while running the SimpleVQA benchmark with WebWatcher and would like to ask for some clarification.
In my setup, I modified the filepath field in simple.jsonl so that it points to local image files, rather than the default configuration where the filepath points to images hosted on Alibaba Cloud OSS.
After doing this, I noticed that during agent inference, the system repeatedly calls VLSearchImage with the local image path, but it fails to find the image and returns "No image found".
For example, I observed logs like the following:
<tool_call>
{"name": "VLSearchImage", "arguments": {"images": ["infer/scripts_eval/images/simplevqa/simplevqa_52.jpg"] }}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>[VLSearchImage] No image found: [], [], []</tool_response>
My questions are:
-
Is this behavior caused by the fact that the images are not hosted on OSS, and that VLSearchImage expects a remote URL rather than a local file path?
-
Does this issue significantly affect the benchmark results? In my experiments, the final score is roughly about half of the score reported in the paper, so I suspect this might be related.
I would really appreciate any guidance on the correct way to configure image paths for SimpleVQA evaluation. If there is a recommended way to use local images instead of OSS URLs, please let me know.
Thank you again for your work and for maintaining this repository.
Issue Content
Hello, and thank you for releasing this great project and open-sourcing the code and models. I really appreciate the effort from the team—it is very helpful for the community.
I encountered an issue while running the SimpleVQA benchmark with WebWatcher and would like to ask for some clarification.
In my setup, I modified the filepath field in simple.jsonl so that it points to local image files, rather than the default configuration where the filepath points to images hosted on Alibaba Cloud OSS.
After doing this, I noticed that during agent inference, the system repeatedly calls VLSearchImage with the local image path, but it fails to find the image and returns "No image found".
For example, I observed logs like the following:
<tool_call>
{"name": "VLSearchImage", "arguments": {"images": ["infer/scripts_eval/images/simplevqa/simplevqa_52.jpg"] }}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>[VLSearchImage] No image found: [], [], []</tool_response>
My questions are:
Is this behavior caused by the fact that the images are not hosted on OSS, and that VLSearchImage expects a remote URL rather than a local file path?
Does this issue significantly affect the benchmark results? In my experiments, the final score is roughly about half of the score reported in the paper, so I suspect this might be related.
I would really appreciate any guidance on the correct way to configure image paths for SimpleVQA evaluation. If there is a recommended way to use local images instead of OSS URLs, please let me know.
Thank you again for your work and for maintaining this repository.