Skip to content

fix: add retry mechanism for jsonl file writing#1405

Open
shahidchdry wants to merge 1 commit into
modelscope:mainfrom
shahidchdry:fix-windows-file-locking
Open

fix: add retry mechanism for jsonl file writing#1405
shahidchdry wants to merge 1 commit into
modelscope:mainfrom
shahidchdry:fix-windows-file-locking

Conversation

@shahidchdry

Copy link
Copy Markdown

Implement retry logic for writing to jsonl file in case of PermissionError.

when writing large batch write requests to filesystem in windows, it raises permission error due to file locking due to this eval freezes and stops listening responses from the server

multiple log like below is generated in the terminal
026-06-09 09:09:15,253 - openai._base_client - INFO: Retrying request to /chat/completions in 0.912047 seconds.00s/it]
2026-06-09 09:09:15,269 - openai._base_client - INFO: Retrying request to /chat/completions in 0.886564 seconds
2026-06-09 09:09:15,285 - openai._base_client - INFO: Retrying request to /chat/completions in 0.923344 seconds
2026-06-09 09:09:15,286 - openai._base_client - INFO: Retrying request to /chat/completions in 0.942365 seconds
2026-06-09 09:09:15,286 - openai._base_client - INFO: Retrying request to /chat/completions in 0.819679 seconds

the below log is generated in outputs logs file
2026-06-09 08:36:49 - evalscope - ERROR: Processing item in subset='default' failed: [Errno 13] Permission denied: './outputs\20260609_082713\reviews\Qwen3.6-27B\gpqa_diamond_default.jsonl'
Traceback:
Traceback (most recent call last):
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\utils\function_utils.py", line 425, in run_in_threads_with_progress
on_result(items[item_index], result)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\evaluator\evaluator.py", line 295, in on_result
self._persist_result(item, *result)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\evaluator\evaluator.py", line 366, in _persist_result
review_result = self.cache_manager.save_review_cache(
subset=item.subset,
...<2 lines>...
save_metadata=self.benchmark.save_metadata,
)
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\api\evaluator\cache.py", line 205, in save_review_cache
dump_jsonl_data(data_list=review_result_dict, jsonl_file=cache_file, dump_mode=DumpMode.APPEND)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\utils\io_utils.py", line 224, in dump_jsonl_data
with jsonl.open(jsonl_file, mode=mode) as writer:
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\jsonlines\jsonlines.py", line 643, in open
fp = builtins.open(file, mode=mode + "t", encoding=encoding)
PermissionError: [Errno 13] Permission denied: './outputs\20260609_082713\reviews\Qwen3.6-27B\gpqa_diamond_default.jsonl'

after rewriting the json dumps its no longer a problem, the error is still raised but it gets fixed on the go by the code, randomness is added to prevent thundering herd problem

Implement retry logic for writing to jsonl file in case of PermissionError.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces retry logic with random backoff to handle PermissionError (file locking) when writing JSONL data in evalscope/utils/io_utils.py. However, the retry loop is nested inside an outer with jsonl.open(...) statement, which defeats the purpose of the retry logic and can cause nested file locking issues on Windows. The reviewer suggests removing the outer with statement and wrapping the file opening and writing directly within the retry loop.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +228 to +238
max_retries = 50
for attempt in range(max_retries):
try:
with jsonl.open(jsonl_file, mode=mode) as writer:
writer.write_all(data_list)
break
except PermissionError as e:
if attempt == max_retries - 1:
logger.error(f'Failed to write to "{jsonl_file}" after "{max_retries}" attempts due to file locking: {e}')
raise
time.sleep(0.1 + random.uniform(0.01, 0.1) * attempt)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The retry logic is nested inside an outer with jsonl.open(jsonl_file, mode=mode) as writer: statement (line 227). This causes two major issues:

  1. Immediate Failure on Initial Open: If the initial file open on line 227 fails with a PermissionError, the retry loop is never reached, and the error is raised immediately.
  2. Nested File Locking: If the initial open succeeds, the code inside the loop tries to open the same file again on line 231. On Windows, this nested open will fail with a PermissionError due to file locking (sharing violation), making the write fail every time.

To fix this, the outer with statement on line 227 should be removed entirely, and the retry loop should wrap the file opening and writing directly at the outer level:

    mode = 'w' if dump_mode == DumpMode.OVERWRITE else 'a'
    max_retries = 50
    for attempt in range(max_retries):
        try:
            with jsonl.open(jsonl_file, mode=mode) as writer:
                writer.write_all(data_list)
            break
        except PermissionError as e:
            if attempt == max_retries - 1:
                logger.error(f'Failed to write to "{jsonl_file}" after "{max_retries}" attempts due to file locking: {e}')
                raise
            time.sleep(0.1 + random.uniform(0.01, 0.1) * attempt)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if initial try fails it will simply go under permissionerror block, sleep and re attempt
  2. break stops the for loop only when a write is successfull

@Yunnglin

Copy link
Copy Markdown
Collaborator

Thanks for the report!

I traced the call chain from your traceback — on_result runs in the main thread inside run_in_threads_with_progress (function_utils.py:424-425), so dump_jsonl_data writes are serial, not concurrent. The PermissionError is not caused by evalscope's threading model.

Also, the current diff has a bug: the original with jsonl.open(...) is kept, and a second one is added inside the retry loop — this opens the file twice, which would itself trigger PermissionError on Windows.

Could you share more about your environment? (antivirus, filesystem, Python version, eval_batch_size) That would help identify the actual root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants