Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers#93
Reuse OpenAI client + tqdm-safe logging for vLLM/OpenAI-compatible servers#93VedantMadane wants to merge 8 commits intolmarena:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a per-thread OpenAI client cache ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/test_openai_client_cache.py`:
- Around line 9-29: Reset the thread-local cache used by openai_client at the
start of this test to ensure deterministic behavior: before patching sys.modules
call into openai_client and clear or reinitialize its thread-local storage (e.g.
reset openai_client._thread_local.__dict__ or recreate
openai_client._thread_local) so get_openai_client uses the mocked openai module
and the test does not inherit a client from a prior test; reference the
get_openai_client function and the module-level _thread_local storage when
making the change.
🧹 Nitpick comments (1)
utils/completion.py (1)
151-168: Avoid retrying on unknown exceptions.Retrying generic exceptions can mask programmer errors and only delays failure. Consider logging and breaking (or re-raising after the last attempt) for the catch‑all branch.
♻️ Proposed adjustment
- except Exception as e: - # Keep the worker alive and allow the main progress loop to continue. - _tqdm_write(f"{type(e).__name__}: {e}") - time.sleep(API_RETRY_SLEEP) + except Exception as e: + _tqdm_write(f"{type(e).__name__}: {e}") + break
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
utils/completion.py (1)
199-207: Bug:eis undefined in KeyError handler; also missing_tqdm_writeusage.Line 206 references
e, but theexcept KeyError:clause on line 205 doesn't capture the exception. This will raise aNameErrorat runtime. Additionally, this function still usesprint()whilechat_completion_openaiwas updated to use_tqdm_writefor tqdm-safe logging.🐛 Proposed fix
except openai.RateLimitError as e: - print(type(e), e) + _tqdm_write(f"{type(e).__name__}: {e}") time.sleep(API_RETRY_SLEEP) except openai.BadRequestError as e: - print(messages) - print(type(e), e) - except KeyError: - print(type(e), e) + _tqdm_write(f"{type(e).__name__}: {e}") + break + except KeyError as e: + _tqdm_write(f"{type(e).__name__}: {e}") break
🧹 Nitpick comments (1)
utils/completion.py (1)
188-188: Nitpick: Unused loop variable should be_.The loop variable
iis never used. For consistency withchat_completion_openai(line 147), use_to indicate an intentionally unused variable.♻️ Suggested change
- for i in range(API_MAX_RETRY): + for _ in range(API_MAX_RETRY):
- Fix undefined `e` variable in KeyError handler (chat_completion_openai_thinking) - Replace print() with _tqdm_write() for tqdm-safe logging - Change unused loop variable `i` to `_` - Add docstrings to chat_completion_openai and chat_completion_openai_thinking
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@utils/completion.py`:
- Around line 216-219: The docstring falsely claims "exponential backoff" while
the retry logic uses a constant API_RETRY_SLEEP; update the function's behavior
or the docstring accordingly: either implement true exponential backoff by
multiplying API_RETRY_SLEEP by a factor per retry (e.g., sleep = API_RETRY_SLEEP
* (2 ** retry_count) and use jitter) in the retry loop that handles rate limits,
or change the docstring in the Send a chat completion request... block to state
that retries use a fixed delay API_RETRY_SLEEP; reference the retry loop and the
API_RETRY_SLEEP constant to locate where to change behavior or text.
🧹 Nitpick comments (2)
utils/completion.py (2)
28-41: Narrow the fallback exception in_tqdm_writeto avoid hiding unexpected failures.Catching a blanket
Exception(Line 40) triggers Ruff BLE001 and can mask unrelated errors. Consider limiting to I/O‑related exceptions (and optionally flushing the fallback print), or explicitly suppress the lint if intentional.♻️ Suggested tweak (adjust exception list per tqdm behavior)
def _tqdm_write(msg: str) -> None: @@ try: tqdm.write(msg) - except Exception: - print(msg) + except (BrokenPipeError, OSError, ValueError): + print(msg, flush=True)
192-209: Consider breaking on unknown exceptions instead of retrying.The exception classes
openai.APITimeoutError,openai.APIConnectionError, andopenai.InternalServerErrorare well-supported in modern OpenAI SDK versions and pose no compatibility risk. However, the catch-allexcept Exceptionhandler (line 206) currently retries unknown failures indefinitely, which could mask programming bugs or unexpected conditions. Consider breaking on unknown exceptions to fail fast instead of silently retrying:Optional refactoring: fail-fast on unknown exceptions
except Exception as e: # Keep the worker alive and allow the main progress loop to continue. _tqdm_write(f"{type(e).__name__}: {e}") - time.sleep(API_RETRY_SLEEP) + break
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
Hi @lmarena, could you please trigger the CI checks for this PR? Thank you! |
…string
- Use __dict__.pop('openai_clients', None) for deterministic test isolation
- Update docstring: 'fixed backoff' -> 'fixed delay' to match implementation
Made-with: Cursor
There was a problem hiding this comment.
🧹 Nitpick comments (1)
utils/completion.py (1)
246-254: Consider adding transient error handling for consistency withchat_completion_openai.The
chat_completion_openaifunction now handlesAPITimeoutError,APIConnectionError, andInternalServerErrorwith retries (lines 199-202), but this function only handlesRateLimitError,BadRequestError, andKeyError. For consistency and resilience when using OpenAI-compatible local servers, consider adding the same transient error handling here.♻️ Suggested addition
except openai.BadRequestError as e: _tqdm_write(f"{type(e).__name__}: {e}") break + except (openai.APITimeoutError, openai.APIConnectionError, openai.InternalServerError) as e: + _tqdm_write(f"{type(e).__name__}: {e}") + time.sleep(API_RETRY_SLEEP) except KeyError as e: _tqdm_write(f"{type(e).__name__}: {e}") break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@utils/completion.py` around lines 246 - 254, The try/except block in completion_openai only handles RateLimitError, BadRequestError, and KeyError; add transient retry handlers to match chat_completion_openai by catching openai.APITimeoutError, openai.APIConnectionError, and openai.InternalServerError (use the same retry logic: log with _tqdm_write, sleep API_RETRY_SLEEP and continue retrying) so transient network/server issues are retried consistently with chat_completion_openai.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@utils/completion.py`:
- Around line 246-254: The try/except block in completion_openai only handles
RateLimitError, BadRequestError, and KeyError; add transient retry handlers to
match chat_completion_openai by catching openai.APITimeoutError,
openai.APIConnectionError, and openai.InternalServerError (use the same retry
logic: log with _tqdm_write, sleep API_RETRY_SLEEP and continue retrying) so
transient network/server issues are retried consistently with
chat_completion_openai.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a324f24c-8be3-4e0c-a7b7-193ccf5ce9f0
📒 Files selected for processing (2)
tests/test_openai_client_cache.pyutils/completion.py
Addresses vLLM pain points reported in #81 (slow throughput + corrupted/invalid progress bar during concurrent generation).
Changes:
Notes: