fix: apply inference restriction to SageMaker MME invoke path by LinZiyuu · Pull Request #8806 · triton-inference-server/server

LinZiyuu · 2026-05-30T20:07:24Z

This PR applies the --http-restricted-api=inference restriction to the SageMaker multi-model (MME) custom-invoke route, which is currently the only inference path that does not honor it.

Previously, POST /models/{model}/invoke was dispatched to SageMakerMMEHandleInfer() with no restriction check. When an operator configured an inference API restriction, the standard HTTP /v2/.../infer, the gRPC infer, and the SageMaker /invocations routes all enforced the required header (the last via the inherited HandleInfer), but the MME invoke route ran inference without it. This is the same alternate-channel gap that #8686 closed for the MME model-repository operations (load/unload/list/get); the invoke route was missed.

This change adds the existing RETURN_AND_RESPOND_IF_RESTRICTED(req, RestrictedCategory::INFERENCE) guard at the top of the /invoke branch, mirroring the base HandleInfer/HandleGenerate handlers and the sibling MME repository branches. There is no behavior change for deployments that do not set an inference restriction.

Reproduced on nvcr.io/nvidia/tritonserver:26.04-py3 (CPU): with --allow-sagemaker --http-restricted-api=inference:triton-secret=s3cr3t and an MME model loaded, the same inference request with no header returns 403 "This API is restricted, expecting header 'triton-secret'" on POST /v2/models/<m>/infer, but returned 200 with a full inference result on POST :8080/models/<m>/invoke. After this change the MME invoke route returns the same 403.

Relates to the SageMaker HTTP-restriction hardening in #8686.

The SageMaker multi-model custom-invoke route (POST /models/{model}/invoke) is dispatched to SageMakerMMEHandleInfer() without checking the inference API restriction. When --http-restricted-api=inference is configured, the standard HTTP infer, the gRPC infer, and the SageMaker /invocations routes all enforce the required header (the last via the inherited HandleInfer), but the MME invoke route runs inference without it. This mirrors the MME model-repository gap closed in triton-inference-server#8686, which did not cover the invoke route. Add the existing RETURN_AND_RESPOND_IF_RESTRICTED(req, RestrictedCategory::INFERENCE) guard at the top of the /invoke branch, matching the base HandleInfer/HandleGenerate handlers and the sibling MME repository branches. No behavior change when no inference restriction is set. Signed-off-by: LinZiyuu <linziyu0205@163.com>

LinZiyuu force-pushed the fix/sagemaker-mme-invoke-inference-restriction branch from fe4bb70 to 5113a24 Compare May 30, 2026 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: apply inference restriction to SageMaker MME invoke path#8806

fix: apply inference restriction to SageMaker MME invoke path#8806
LinZiyuu wants to merge 1 commit into
triton-inference-server:mainfrom
LinZiyuu:fix/sagemaker-mme-invoke-inference-restriction

LinZiyuu commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LinZiyuu commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant