fix: add retry mechanism for jsonl file writing#1405
Conversation
Implement retry logic for writing to jsonl file in case of PermissionError.
There was a problem hiding this comment.
Code Review
This pull request introduces retry logic with random backoff to handle PermissionError (file locking) when writing JSONL data in evalscope/utils/io_utils.py. However, the retry loop is nested inside an outer with jsonl.open(...) statement, which defeats the purpose of the retry logic and can cause nested file locking issues on Windows. The reviewer suggests removing the outer with statement and wrapping the file opening and writing directly within the retry loop.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| max_retries = 50 | ||
| for attempt in range(max_retries): | ||
| try: | ||
| with jsonl.open(jsonl_file, mode=mode) as writer: | ||
| writer.write_all(data_list) | ||
| break | ||
| except PermissionError as e: | ||
| if attempt == max_retries - 1: | ||
| logger.error(f'Failed to write to "{jsonl_file}" after "{max_retries}" attempts due to file locking: {e}') | ||
| raise | ||
| time.sleep(0.1 + random.uniform(0.01, 0.1) * attempt) |
There was a problem hiding this comment.
The retry logic is nested inside an outer with jsonl.open(jsonl_file, mode=mode) as writer: statement (line 227). This causes two major issues:
- Immediate Failure on Initial Open: If the initial file open on line 227 fails with a
PermissionError, the retry loop is never reached, and the error is raised immediately. - Nested File Locking: If the initial open succeeds, the code inside the loop tries to open the same file again on line 231. On Windows, this nested open will fail with a
PermissionErrordue to file locking (sharing violation), making the write fail every time.
To fix this, the outer with statement on line 227 should be removed entirely, and the retry loop should wrap the file opening and writing directly at the outer level:
mode = 'w' if dump_mode == DumpMode.OVERWRITE else 'a'
max_retries = 50
for attempt in range(max_retries):
try:
with jsonl.open(jsonl_file, mode=mode) as writer:
writer.write_all(data_list)
break
except PermissionError as e:
if attempt == max_retries - 1:
logger.error(f'Failed to write to "{jsonl_file}" after "{max_retries}" attempts due to file locking: {e}')
raise
time.sleep(0.1 + random.uniform(0.01, 0.1) * attempt)There was a problem hiding this comment.
- if initial try fails it will simply go under permissionerror block, sleep and re attempt
- break stops the for loop only when a write is successfull
|
Thanks for the report! I traced the call chain from your traceback — Also, the current diff has a bug: the original Could you share more about your environment? (antivirus, filesystem, Python version, |
Implement retry logic for writing to jsonl file in case of PermissionError.
when writing large batch write requests to filesystem in windows, it raises permission error due to file locking due to this eval freezes and stops listening responses from the server
multiple log like below is generated in the terminal
026-06-09 09:09:15,253 - openai._base_client - INFO: Retrying request to /chat/completions in 0.912047 seconds.00s/it]
2026-06-09 09:09:15,269 - openai._base_client - INFO: Retrying request to /chat/completions in 0.886564 seconds
2026-06-09 09:09:15,285 - openai._base_client - INFO: Retrying request to /chat/completions in 0.923344 seconds
2026-06-09 09:09:15,286 - openai._base_client - INFO: Retrying request to /chat/completions in 0.942365 seconds
2026-06-09 09:09:15,286 - openai._base_client - INFO: Retrying request to /chat/completions in 0.819679 seconds
the below log is generated in outputs logs file
2026-06-09 08:36:49 - evalscope - ERROR: Processing item in subset='default' failed: [Errno 13] Permission denied: './outputs\20260609_082713\reviews\Qwen3.6-27B\gpqa_diamond_default.jsonl'
Traceback:
Traceback (most recent call last):
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\utils\function_utils.py", line 425, in run_in_threads_with_progress
on_result(items[item_index], result)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\evaluator\evaluator.py", line 295, in on_result
self._persist_result(item, *result)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\evaluator\evaluator.py", line 366, in _persist_result
review_result = self.cache_manager.save_review_cache(
subset=item.subset,
...<2 lines>...
save_metadata=self.benchmark.save_metadata,
)
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\api\evaluator\cache.py", line 205, in save_review_cache
dump_jsonl_data(data_list=review_result_dict, jsonl_file=cache_file, dump_mode=DumpMode.APPEND)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\evalscope\utils\io_utils.py", line 224, in dump_jsonl_data
with jsonl.open(jsonl_file, mode=mode) as writer:
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shahid\AppData\Roaming\Python\Python314\site-packages\jsonlines\jsonlines.py", line 643, in open
fp = builtins.open(file, mode=mode + "t", encoding=encoding)
PermissionError: [Errno 13] Permission denied: './outputs\20260609_082713\reviews\Qwen3.6-27B\gpqa_diamond_default.jsonl'
after rewriting the json dumps its no longer a problem, the error is still raised but it gets fixed on the go by the code, randomness is added to prevent thundering herd problem