Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers by VedantMadane · Pull Request #93 · lmarena/arena-hard-auto

VedantMadane · 2026-01-19T08:18:02Z

Addresses vLLM pain points reported in #81 (slow throughput + corrupted/invalid progress bar during concurrent generation).

Changes:

Cache OpenAI client per thread/endpoint (avoid per-request client construction overhead)
Use tqdm.write for transient error logs to avoid mangling progress output
Add optional imeout config for OpenAI-compatible endpoints
Add unit test for client caching

Notes:

This keeps API behavior the same; it only reduces overhead and improves console output stability.

coderabbitai · 2026-01-19T08:24:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a per-thread OpenAI client cache (get_openai_client), updates completion routines to use it with tqdm-safe logging and expanded transient-error handling, documents an optional timeout and parallelism guidance, and adds a unit test validating client caching behavior.

Changes

Cohort / File(s)	Summary
Documentation `README.md`, `config/api_config.yaml`	Adds guidance for vLLM / OpenAI-compatible local servers: a commented `timeout` example (1800s) in `config/api_config.yaml` and a brief note about sustainable parallelism in `README.md`. No runtime behavior changes.
OpenAI client caching `utils/openai_client.py`	New `get_openai_client(api_dict: Optional[dict]=None)` implementing a per-thread cache keyed by `(api_base, api_key, timeout)`, lazily constructing OpenAI clients on cache miss. Review cache key correctness and thread-local lifecycle.
Completion logic & logging `utils/completion.py`	Replaces direct OpenAI construction with `get_openai_client`, adds `_tqdm_write` for tqdm-safe output, broadens transient OpenAI error handling and logging, and preserves existing retry loops. Verify logging behavior and exception handling paths.
Tests `tests/test_openai_client_cache.py`	New unit test mocking `openai` to assert clients are cached per thread/endpoint and that changing `timeout` produces a new client. Review test isolation, thread-local cache clearing, and sys.modules patching.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main changes: client reuse and tqdm-safe logging for vLLM/OpenAI servers, which align with the core modifications across utils/openai_client.py, utils/completion.py, and config documentation.
Description check	✅ Passed	The description clearly explains the motivation (`#81` vLLM issues), lists all major changes (client caching, tqdm logging, timeout config, unit test), and notes that API behavior is preserved.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/test_openai_client_cache.py`:
- Around line 9-29: Reset the thread-local cache used by openai_client at the
start of this test to ensure deterministic behavior: before patching sys.modules
call into openai_client and clear or reinitialize its thread-local storage (e.g.
reset openai_client._thread_local.__dict__ or recreate
openai_client._thread_local) so get_openai_client uses the mocked openai module
and the test does not inherit a client from a prior test; reference the
get_openai_client function and the module-level _thread_local storage when
making the change.

🧹 Nitpick comments (1)

utils/completion.py (1)
151-168: Avoid retrying on unknown exceptions.

Retrying generic exceptions can mask programmer errors and only delays failure. Consider logging and breaking (or re-raising after the last attempt) for the catch‑all branch.
♻️ Proposed adjustment
-        except Exception as e:
-            # Keep the worker alive and allow the main progress loop to continue.
-            _tqdm_write(f"{type(e).__name__}: {e}")
-            time.sleep(API_RETRY_SLEEP)
+        except Exception as e:
+            _tqdm_write(f"{type(e).__name__}: {e}")
+            break

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

utils/completion.py (1)

199-207: Bug: e is undefined in KeyError handler; also missing _tqdm_write usage.

Line 206 references e, but the except KeyError: clause on line 205 doesn't capture the exception. This will raise a NameError at runtime. Additionally, this function still uses print() while chat_completion_openai was updated to use _tqdm_write for tqdm-safe logging.

🐛 Proposed fix

         except openai.RateLimitError as e:
-            print(type(e), e)
+            _tqdm_write(f"{type(e).__name__}: {e}")
             time.sleep(API_RETRY_SLEEP)
         except openai.BadRequestError as e:
-            print(messages)
-            print(type(e), e)
-        except KeyError:
-            print(type(e), e)
+            _tqdm_write(f"{type(e).__name__}: {e}")
+            break
+        except KeyError as e:
+            _tqdm_write(f"{type(e).__name__}: {e}")
             break

🧹 Nitpick comments (1)

utils/completion.py (1)
188-188: Nitpick: Unused loop variable should be _.

The loop variable i is never used. For consistency with chat_completion_openai (line 147), use _ to indicate an intentionally unused variable.
♻️ Suggested change
-    for i in range(API_MAX_RETRY):
+    for _ in range(API_MAX_RETRY):

- Fix undefined `e` variable in KeyError handler (chat_completion_openai_thinking) - Replace print() with _tqdm_write() for tqdm-safe logging - Change unused loop variable `i` to `_` - Add docstrings to chat_completion_openai and chat_completion_openai_thinking

VedantMadane · 2026-01-19T20:13:29Z

@coderabbitai review

coderabbitai · 2026-01-19T20:13:34Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@utils/completion.py`:
- Around line 216-219: The docstring falsely claims "exponential backoff" while
the retry logic uses a constant API_RETRY_SLEEP; update the function's behavior
or the docstring accordingly: either implement true exponential backoff by
multiplying API_RETRY_SLEEP by a factor per retry (e.g., sleep = API_RETRY_SLEEP
* (2 ** retry_count) and use jitter) in the retry loop that handles rate limits,
or change the docstring in the Send a chat completion request... block to state
that retries use a fixed delay API_RETRY_SLEEP; reference the retry loop and the
API_RETRY_SLEEP constant to locate where to change behavior or text.

🧹 Nitpick comments (2)

utils/completion.py (2)
28-41: Narrow the fallback exception in _tqdm_write to avoid hiding unexpected failures.

Catching a blanket Exception (Line 40) triggers Ruff BLE001 and can mask unrelated errors. Consider limiting to I/O‑related exceptions (and optionally flushing the fallback print), or explicitly suppress the lint if intentional.
♻️ Suggested tweak (adjust exception list per tqdm behavior)
 def _tqdm_write(msg: str) -> None:
@@
     try:
         tqdm.write(msg)
-    except Exception:
-        print(msg)
+    except (BrokenPipeError, OSError, ValueError):
+        print(msg, flush=True)
192-209: Consider breaking on unknown exceptions instead of retrying.

The exception classes openai.APITimeoutError, openai.APIConnectionError, and openai.InternalServerError are well-supported in modern OpenAI SDK versions and pose no compatibility risk. However, the catch-all except Exception handler (line 206) currently retries unknown failures indefinitely, which could mask programming bugs or unexpected conditions. Consider breaking on unknown exceptions to fail fast instead of silently retrying:
Optional refactoring: fail-fast on unknown exceptions
         except Exception as e:
             # Keep the worker alive and allow the main progress loop to continue.
             _tqdm_write(f"{type(e).__name__}: {e}")
-            time.sleep(API_RETRY_SLEEP)
+            break

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

VedantMadane · 2026-03-01T17:06:34Z

Hi @lmarena, could you please trigger the CI checks for this PR? Thank you!

…string - Use __dict__.pop('openai_clients', None) for deterministic test isolation - Update docstring: 'fixed backoff' -> 'fixed delay' to match implementation Made-with: Cursor

coderabbitai

🧹 Nitpick comments (1)

utils/completion.py (1)
246-254: Consider adding transient error handling for consistency with chat_completion_openai.

The chat_completion_openai function now handles APITimeoutError, APIConnectionError, and InternalServerError with retries (lines 199-202), but this function only handles RateLimitError, BadRequestError, and KeyError. For consistency and resilience when using OpenAI-compatible local servers, consider adding the same transient error handling here.
♻️ Suggested addition
         except openai.BadRequestError as e:
             _tqdm_write(f"{type(e).__name__}: {e}")
             break
+        except (openai.APITimeoutError, openai.APIConnectionError, openai.InternalServerError) as e:
+            _tqdm_write(f"{type(e).__name__}: {e}")
+            time.sleep(API_RETRY_SLEEP)
         except KeyError as e:
             _tqdm_write(f"{type(e).__name__}: {e}")
             break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utils/completion.py` around lines 246 - 254, The try/except block in
completion_openai only handles RateLimitError, BadRequestError, and KeyError;
add transient retry handlers to match chat_completion_openai by catching
openai.APITimeoutError, openai.APIConnectionError, and
openai.InternalServerError (use the same retry logic: log with _tqdm_write,
sleep API_RETRY_SLEEP and continue retrying) so transient network/server issues
are retried consistently with chat_completion_openai.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@utils/completion.py`:
- Around line 246-254: The try/except block in completion_openai only handles
RateLimitError, BadRequestError, and KeyError; add transient retry handlers to
match chat_completion_openai by catching openai.APITimeoutError,
openai.APIConnectionError, and openai.InternalServerError (use the same retry
logic: log with _tqdm_write, sleep API_RETRY_SLEEP and continue retrying) so
transient network/server issues are retried consistently with
chat_completion_openai.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a324f24c-8be3-4e0c-a7b7-193ccf5ce9f0

📥 Commits

Reviewing files that changed from the base of the PR and between b53d353 and fdff334.

📒 Files selected for processing (2)

tests/test_openai_client_cache.py
utils/completion.py

Reuse OpenAI client for local servers (vLLM) and avoid tqdm corruption

9932b91

VedantMadane mentioned this pull request Jan 19, 2026

The speed is slow and the bar is invalid when I use vllm #81

Open

coderabbitai Bot reviewed Jan 19, 2026

View reviewed changes

Comment thread tests/test_openai_client_cache.py

VedantMadane added 2 commits January 19, 2026 17:18

docs: add docstrings to satisfy 80% coverage threshold

463b577

fix: add test cache isolation and use consistent client caching

3f3f496

coderabbitai Bot reviewed Jan 19, 2026

View reviewed changes

VedantMadane added 2 commits January 20, 2026 01:34

docs: add docstrings to register_api and register_engine decorators

e554bf6

docs: add module docstring to openai_client.py

b53d353

coderabbitai Bot reviewed Jan 19, 2026

View reviewed changes

Comment thread utils/completion.py Outdated

Update utils/completion.py

146f322

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Address CodeRabbit review: reset thread-local cache in tests, fix doc…

fdff334

…string - Use __dict__.pop('openai_clients', None) for deterministic test isolation - Update docstring: 'fixed backoff' -> 'fixed delay' to match implementation Made-with: Cursor

coderabbitai Bot reviewed Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers#93

Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers#93
VedantMadane wants to merge 8 commits intolmarena:mainfrom
VedantMadane:fix/vllm-openai-client-cache

VedantMadane commented Jan 19, 2026

Uh oh!

coderabbitai Bot commented Jan 19, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

VedantMadane commented Jan 19, 2026

Uh oh!

coderabbitai Bot commented Jan 19, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

VedantMadane commented Mar 1, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VedantMadane commented Jan 19, 2026

Uh oh!

coderabbitai Bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

VedantMadane commented Jan 19, 2026

Uh oh!

coderabbitai Bot commented Jan 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VedantMadane commented Mar 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jan 19, 2026 •

edited

Loading