feat: middleware JSON→form-data conversion for /v1/videos + async end…#5890
Merged
feat: middleware JSON→form-data conversion for /v1/videos + async end…#5890
Conversation
…point test - Middleware auto-converts JSON to multipart/form-data for FORM_DATA_ROUTES (SageMaker async only supports JSON payloads, but /v1/videos needs form-data) - 3 new unit tests: conversion, passthrough for non-video, passthrough for form-data - Video async endpoint test: Wan2.1-T2V on g5.12xlarge (TP=2), validates job id
jinyan-li1
approved these changes
Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: SageMaker middleware JSON→form-data conversion for video generation
Summary
Adds automatic JSON-to-multipart/form-data conversion in the SageMaker routing middleware for endpoints that require form data (e.g., /v1/videos). This enables video
generation models (Wan2.1-T2V) to work through SageMaker async inference, where payloads are always sent as JSON.
Also adds an async SageMaker endpoint integration test for video generation.
Problem
SageMaker async inference sends payloads as JSON to the container's /invocations endpoint. However, vllm-omni's /v1/videos API only accepts multipart/form-data. Without
conversion, the video endpoint returns 400 Bad Request: Field required because it can't parse the JSON body.
Changes
Middleware (scripts/vllm/omni_sagemaker_serve.py)
Unit Tests (test/vllm-omni/sagemaker/test_sagemaker_middleware.py)
Endpoint Test (test/vllm-omni/sagemaker/test_sm_omni_endpoint.py)
Current Limitations: SageMaker Video Model Support
Async job ID only — vllm-omni v0.18.0's /v1/videos endpoint is async by design. It returns a job ID immediately (status: "queued") and generates the video in the background.
SageMaker async inference writes this JSON response to S3, not the actual MP4 file. To retrieve the video, you would need to poll GET /v1/videos/{id} and download from
GET /v1/videos/{id}/content, which isn't possible through SageMaker's request/response model.
No sync video endpoint in v0.18.0 — POST /v1/videos/sync (which blocks and returns raw MP4 bytes) is not available in vllm-omni v0.18.0 (returns 405). It has been added in
newer commits (c9dbc09). Once we
upgrade vllm-omni, switching the route to /v1/videos/sync will return actual MP4 bytes through SageMaker async — the middleware already includes /v1/videos/sync in
FORM_DATA_ROUTES.
GPU requirements — Wan2.1-T2V-1.3B requires ~24GB VRAM. On ml.g5.12xlarge (4× A10G 24GB each), tensor_parallel_size=2 is needed. On instances with a single ≥48GB GPU (e.g.,
ml.g6e.xlarge with L40S), TP=1 works.
Testing
Toggle if you are merging into master Branch
By default, docker image builds and tests are disabled. Two ways to run builds and tests:
How to use the helper utility for updating dlc_developer_config.toml
Assuming your remote is called
origin(you can find out more withgit remote -v)...python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp originpython src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp originpython src/prepare_dlc_dev_environment.py -rcp originNOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:
sagemaker_remote_tests = truesagemaker_efa_tests = truesagemaker_rc_tests = truesagemaker_local_tests = trueHow to use PR description
Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:# /buildspec <buildspec_path># /buildspec pytorch/training/buildspec.yml# /tests <test_list># /tests sanity security ec2sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.Toggle if you are merging into main Branch
PR Checklist
pre-commit run --all-fileslocally before creating this PR. (Read DEVELOPMENT.md for details).