Skip to content

fix: Forward HTTP headers for generate requests#8795

Draft
mudit-eng wants to merge 1 commit into
mainfrom
cursor/fix-generate-header-forwarding-d7c2
Draft

fix: Forward HTTP headers for generate requests#8795
mudit-eng wants to merge 1 commit into
mainfrom
cursor/fix-generate-header-forwarding-d7c2

Conversation

@mudit-eng

@mudit-eng mudit-eng commented May 20, 2026

Copy link
Copy Markdown
Contributor

What does the PR do?

Forwards matching HTTP headers as inference request parameters in the /generate and /generate_stream HTTP paths by calling the same ForwardHeaders helper used by /infer.

Adds a generate endpoint regression using an ensemble that routes to the Python backend and verifies a forwarded HTTP header is visible through request.parameters().

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes where the local runtime supports it.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

None.

Where should the reviewer start?

src/http_server.cc, then qa/L0_http/generate_endpoint_test.py.

Test plan:

  • git diff --check HEAD~1..HEAD

  • python3 -m py_compile qa/L0_http/generate_endpoint_test.py qa/python_models/generate_models/mock_llm/1/model.py

  • qa/L0_http/test.sh generate endpoint section (not run locally: /opt/tritonserver/bin/tritonserver is not present in this environment)

  • CI Pipeline ID:
    N/A

Caveats:

Local end-to-end QA could not be run because this environment does not include the Triton server runtime binary.

Background

Linear issue TGH-116 reported that --http-header-forward-pattern worked for /infer but not /generate when routing from an ensemble to the Python backend.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Resolves Linear TGH-116

Linear Issue: TGH-116

Open in Web Open in Cursor 

Co-authored-by: Mudit Aggarwal <mudita@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants