Skip to content

Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers#93

Open
VedantMadane wants to merge 8 commits intolmarena:mainfrom
VedantMadane:fix/vllm-openai-client-cache
Open

Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers#93
VedantMadane wants to merge 8 commits intolmarena:mainfrom
VedantMadane:fix/vllm-openai-client-cache

Conversation

@VedantMadane
Copy link
Copy Markdown

Addresses vLLM pain points reported in #81 (slow throughput + corrupted/invalid progress bar during concurrent generation).

Changes:

  • Cache OpenAI client per thread/endpoint (avoid per-request client construction overhead)
  • Use tqdm.write for transient error logs to avoid mangling progress output
  • Add optional imeout config for OpenAI-compatible endpoints
  • Add unit test for client caching

Notes:

  • This keeps API behavior the same; it only reduces overhead and improves console output stability.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a per-thread OpenAI client cache (get_openai_client), updates completion routines to use it with tqdm-safe logging and expanded transient-error handling, documents an optional timeout and parallelism guidance, and adds a unit test validating client caching behavior.

Changes

Cohort / File(s) Summary
Documentation
README.md, config/api_config.yaml
Adds guidance for vLLM / OpenAI-compatible local servers: a commented timeout example (1800s) in config/api_config.yaml and a brief note about sustainable parallelism in README.md. No runtime behavior changes.
OpenAI client caching
utils/openai_client.py
New get_openai_client(api_dict: Optional[dict]=None) implementing a per-thread cache keyed by (api_base, api_key, timeout), lazily constructing OpenAI clients on cache miss. Review cache key correctness and thread-local lifecycle.
Completion logic & logging
utils/completion.py
Replaces direct OpenAI construction with get_openai_client, adds _tqdm_write for tqdm-safe output, broadens transient OpenAI error handling and logging, and preserves existing retry loops. Verify logging behavior and exception handling paths.
Tests
tests/test_openai_client_cache.py
New unit test mocking openai to assert clients are cached per thread/endpoint and that changing timeout produces a new client. Review test isolation, thread-local cache clearing, and sys.modules patching.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main changes: client reuse and tqdm-safe logging for vLLM/OpenAI servers, which align with the core modifications across utils/openai_client.py, utils/completion.py, and config documentation.
Description check ✅ Passed The description clearly explains the motivation (#81 vLLM issues), lists all major changes (client caching, tqdm logging, timeout config, unit test), and notes that API behavior is preserved.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/test_openai_client_cache.py`:
- Around line 9-29: Reset the thread-local cache used by openai_client at the
start of this test to ensure deterministic behavior: before patching sys.modules
call into openai_client and clear or reinitialize its thread-local storage (e.g.
reset openai_client._thread_local.__dict__ or recreate
openai_client._thread_local) so get_openai_client uses the mocked openai module
and the test does not inherit a client from a prior test; reference the
get_openai_client function and the module-level _thread_local storage when
making the change.
🧹 Nitpick comments (1)
utils/completion.py (1)

151-168: Avoid retrying on unknown exceptions.

Retrying generic exceptions can mask programmer errors and only delays failure. Consider logging and breaking (or re-raising after the last attempt) for the catch‑all branch.

♻️ Proposed adjustment
-        except Exception as e:
-            # Keep the worker alive and allow the main progress loop to continue.
-            _tqdm_write(f"{type(e).__name__}: {e}")
-            time.sleep(API_RETRY_SLEEP)
+        except Exception as e:
+            _tqdm_write(f"{type(e).__name__}: {e}")
+            break

Comment thread tests/test_openai_client_cache.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utils/completion.py (1)

199-207: Bug: e is undefined in KeyError handler; also missing _tqdm_write usage.

Line 206 references e, but the except KeyError: clause on line 205 doesn't capture the exception. This will raise a NameError at runtime. Additionally, this function still uses print() while chat_completion_openai was updated to use _tqdm_write for tqdm-safe logging.

🐛 Proposed fix
         except openai.RateLimitError as e:
-            print(type(e), e)
+            _tqdm_write(f"{type(e).__name__}: {e}")
             time.sleep(API_RETRY_SLEEP)
         except openai.BadRequestError as e:
-            print(messages)
-            print(type(e), e)
-        except KeyError:
-            print(type(e), e)
+            _tqdm_write(f"{type(e).__name__}: {e}")
+            break
+        except KeyError as e:
+            _tqdm_write(f"{type(e).__name__}: {e}")
             break
🧹 Nitpick comments (1)
utils/completion.py (1)

188-188: Nitpick: Unused loop variable should be _.

The loop variable i is never used. For consistency with chat_completion_openai (line 147), use _ to indicate an intentionally unused variable.

♻️ Suggested change
-    for i in range(API_MAX_RETRY):
+    for _ in range(API_MAX_RETRY):

- Fix undefined `e` variable in KeyError handler (chat_completion_openai_thinking)
- Replace print() with _tqdm_write() for tqdm-safe logging
- Change unused loop variable `i` to `_`
- Add docstrings to chat_completion_openai and chat_completion_openai_thinking
@VedantMadane
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@utils/completion.py`:
- Around line 216-219: The docstring falsely claims "exponential backoff" while
the retry logic uses a constant API_RETRY_SLEEP; update the function's behavior
or the docstring accordingly: either implement true exponential backoff by
multiplying API_RETRY_SLEEP by a factor per retry (e.g., sleep = API_RETRY_SLEEP
* (2 ** retry_count) and use jitter) in the retry loop that handles rate limits,
or change the docstring in the Send a chat completion request... block to state
that retries use a fixed delay API_RETRY_SLEEP; reference the retry loop and the
API_RETRY_SLEEP constant to locate where to change behavior or text.
🧹 Nitpick comments (2)
utils/completion.py (2)

28-41: Narrow the fallback exception in _tqdm_write to avoid hiding unexpected failures.

Catching a blanket Exception (Line 40) triggers Ruff BLE001 and can mask unrelated errors. Consider limiting to I/O‑related exceptions (and optionally flushing the fallback print), or explicitly suppress the lint if intentional.

♻️ Suggested tweak (adjust exception list per tqdm behavior)
 def _tqdm_write(msg: str) -> None:
@@
     try:
         tqdm.write(msg)
-    except Exception:
-        print(msg)
+    except (BrokenPipeError, OSError, ValueError):
+        print(msg, flush=True)

192-209: Consider breaking on unknown exceptions instead of retrying.

The exception classes openai.APITimeoutError, openai.APIConnectionError, and openai.InternalServerError are well-supported in modern OpenAI SDK versions and pose no compatibility risk. However, the catch-all except Exception handler (line 206) currently retries unknown failures indefinitely, which could mask programming bugs or unexpected conditions. Consider breaking on unknown exceptions to fail fast instead of silently retrying:

Optional refactoring: fail-fast on unknown exceptions
         except Exception as e:
             # Keep the worker alive and allow the main progress loop to continue.
             _tqdm_write(f"{type(e).__name__}: {e}")
-            time.sleep(API_RETRY_SLEEP)
+            break

Comment thread utils/completion.py Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@VedantMadane
Copy link
Copy Markdown
Author

Hi @lmarena, could you please trigger the CI checks for this PR? Thank you!

…string

- Use __dict__.pop('openai_clients', None) for deterministic test isolation
- Update docstring: 'fixed backoff' -> 'fixed delay' to match implementation

Made-with: Cursor
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
utils/completion.py (1)

246-254: Consider adding transient error handling for consistency with chat_completion_openai.

The chat_completion_openai function now handles APITimeoutError, APIConnectionError, and InternalServerError with retries (lines 199-202), but this function only handles RateLimitError, BadRequestError, and KeyError. For consistency and resilience when using OpenAI-compatible local servers, consider adding the same transient error handling here.

♻️ Suggested addition
         except openai.BadRequestError as e:
             _tqdm_write(f"{type(e).__name__}: {e}")
             break
+        except (openai.APITimeoutError, openai.APIConnectionError, openai.InternalServerError) as e:
+            _tqdm_write(f"{type(e).__name__}: {e}")
+            time.sleep(API_RETRY_SLEEP)
         except KeyError as e:
             _tqdm_write(f"{type(e).__name__}: {e}")
             break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utils/completion.py` around lines 246 - 254, The try/except block in
completion_openai only handles RateLimitError, BadRequestError, and KeyError;
add transient retry handlers to match chat_completion_openai by catching
openai.APITimeoutError, openai.APIConnectionError, and
openai.InternalServerError (use the same retry logic: log with _tqdm_write,
sleep API_RETRY_SLEEP and continue retrying) so transient network/server issues
are retried consistently with chat_completion_openai.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@utils/completion.py`:
- Around line 246-254: The try/except block in completion_openai only handles
RateLimitError, BadRequestError, and KeyError; add transient retry handlers to
match chat_completion_openai by catching openai.APITimeoutError,
openai.APIConnectionError, and openai.InternalServerError (use the same retry
logic: log with _tqdm_write, sleep API_RETRY_SLEEP and continue retrying) so
transient network/server issues are retried consistently with
chat_completion_openai.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a324f24c-8be3-4e0c-a7b7-193ccf5ce9f0

📥 Commits

Reviewing files that changed from the base of the PR and between b53d353 and fdff334.

📒 Files selected for processing (2)
  • tests/test_openai_client_cache.py
  • utils/completion.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant