Skip to content

feat: middleware JSON→form-data conversion for /v1/videos + async end…#5890

Merged
Yadan-Wei merged 11 commits intomainfrom
omni-video
Apr 8, 2026
Merged

feat: middleware JSON→form-data conversion for /v1/videos + async end…#5890
Yadan-Wei merged 11 commits intomainfrom
omni-video

Conversation

@Yadan-Wei
Copy link
Copy Markdown
Contributor

@Yadan-Wei Yadan-Wei commented Apr 7, 2026

feat: SageMaker middleware JSON→form-data conversion for video generation

Summary

Adds automatic JSON-to-multipart/form-data conversion in the SageMaker routing middleware for endpoints that require form data (e.g., /v1/videos). This enables video
generation models (Wan2.1-T2V) to work through SageMaker async inference, where payloads are always sent as JSON.

Also adds an async SageMaker endpoint integration test for video generation.

Problem

SageMaker async inference sends payloads as JSON to the container's /invocations endpoint. However, vllm-omni's /v1/videos API only accepts multipart/form-data. Without
conversion, the video endpoint returns 400 Bad Request: Field required because it can't parse the JSON body.

Changes

Middleware (scripts/vllm/omni_sagemaker_serve.py)

  • Added FORM_DATA_ROUTES set: {"/v1/videos", "/v1/videos/sync"}
  • When a request targets a form-data route with Content-Type: application/json, the middleware:
    1. Reads the JSON body
    2. Builds a multipart/form-data body with a random boundary
    3. Replaces the Content-Type header and request body
  • Non-JSON requests (e.g., direct curl with -F) pass through unchanged
  • Non-video routes (TTS, image, chat) pass through unchanged

Unit Tests (test/vllm-omni/sagemaker/test_sagemaker_middleware.py)

  • test_json_to_formdata_for_video_route — verifies JSON→form-data conversion on /v1/videos
  • test_json_passthrough_for_non_video_route — verifies JSON is NOT converted on /v1/audio/speech
  • test_formdata_passthrough_for_video_route — verifies form-data input passes through unchanged

Endpoint Test (test/vllm-omni/sagemaker/test_sm_omni_endpoint.py)

  • test_vllm_omni_video_async_endpoint — deploys Wan2.1-T2V-1.3B on ml.g5.12xlarge (TP=2) as async endpoint
  • Sends JSON payload with CustomAttributes: route=/v1/videos
  • Validates the response contains a video job id (proves middleware routing + JSON→form-data conversion work end-to-end)
  • Model loaded from S3 tarball (s3://dlc-cicd-models/omni-models/wan2.1-t2v-1.3b.tar.gz)

Current Limitations: SageMaker Video Model Support

  1. Async job ID only — vllm-omni v0.18.0's /v1/videos endpoint is async by design. It returns a job ID immediately (status: "queued") and generates the video in the background.
    SageMaker async inference writes this JSON response to S3, not the actual MP4 file. To retrieve the video, you would need to poll GET /v1/videos/{id} and download from
    GET /v1/videos/{id}/content, which isn't possible through SageMaker's request/response model.

  2. No sync video endpoint in v0.18.0 — POST /v1/videos/sync (which blocks and returns raw MP4 bytes) is not available in vllm-omni v0.18.0 (returns 405). It has been added in
    newer commits (c9dbc09). Once we
    upgrade vllm-omni, switching the route to /v1/videos/sync will return actual MP4 bytes through SageMaker async — the middleware already includes /v1/videos/sync in
    FORM_DATA_ROUTES.

  3. GPU requirements — Wan2.1-T2V-1.3B requires ~24GB VRAM. On ml.g5.12xlarge (4× A10G 24GB each), tensor_parallel_size=2 is needed. On instances with a single ≥48GB GPU (e.g.,
    ml.g6e.xlarge with L40S), TP=1 works.

Testing

  • 15 unit tests passing (12 existing + 3 new)
  • Pre-commit clean

Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

  1. Using dlc_developer_config.toml
  2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)
How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

  • Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

  • Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

  • Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

  • sagemaker_remote_tests = true
  • sagemaker_efa_tests = true
  • sagemaker_rc_tests = true
  • sagemaker_local_tests = true
How to use PR description Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:
  • # /buildspec <buildspec_path>
    • e.g.: # /buildspec pytorch/training/buildspec.yml
    • If this line is commented out, dlc_developer_config.toml will be used.
  • # /tests <test_list>
    • e.g.: # /tests sanity security ec2
    • If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.
# /buildspec <buildspec_path>
# /tests <test_list>
Toggle if you are merging into main Branch

PR Checklist

  • [] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

…point test

- Middleware auto-converts JSON to multipart/form-data for FORM_DATA_ROUTES
  (SageMaker async only supports JSON payloads, but /v1/videos needs form-data)
- 3 new unit tests: conversion, passthrough for non-video, passthrough for form-data
- Video async endpoint test: Wan2.1-T2V on g5.12xlarge (TP=2), validates job id
@aws-deep-learning-containers-ci aws-deep-learning-containers-ci bot added authorized Size:XL Determines the size of the PR labels Apr 7, 2026
@Yadan-Wei Yadan-Wei enabled auto-merge (squash) April 8, 2026 01:17
@Yadan-Wei Yadan-Wei merged commit d1d9b7e into main Apr 8, 2026
95 of 97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authorized Size:XL Determines the size of the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants