Skip to content

fix: apply inference restriction to SageMaker MME invoke path#8806

Open
LinZiyuu wants to merge 1 commit into
triton-inference-server:mainfrom
LinZiyuu:fix/sagemaker-mme-invoke-inference-restriction
Open

fix: apply inference restriction to SageMaker MME invoke path#8806
LinZiyuu wants to merge 1 commit into
triton-inference-server:mainfrom
LinZiyuu:fix/sagemaker-mme-invoke-inference-restriction

Conversation

@LinZiyuu

Copy link
Copy Markdown

This PR applies the --http-restricted-api=inference restriction to the SageMaker multi-model (MME) custom-invoke route, which is currently the only inference path that does not honor it.

Previously, POST /models/{model}/invoke was dispatched to SageMakerMMEHandleInfer() with no restriction check. When an operator configured an inference API restriction, the standard HTTP /v2/.../infer, the gRPC infer, and the SageMaker /invocations routes all enforced the required header (the last via the inherited HandleInfer), but the MME invoke route ran inference without it. This is the same alternate-channel gap that #8686 closed for the MME model-repository operations (load/unload/list/get); the invoke route was missed.

This change adds the existing RETURN_AND_RESPOND_IF_RESTRICTED(req, RestrictedCategory::INFERENCE) guard at the top of the /invoke branch, mirroring the base HandleInfer/HandleGenerate handlers and the sibling MME repository branches. There is no behavior change for deployments that do not set an inference restriction.

Reproduced on nvcr.io/nvidia/tritonserver:26.04-py3 (CPU): with --allow-sagemaker --http-restricted-api=inference:triton-secret=s3cr3t and an MME model loaded, the same inference request with no header returns 403 "This API is restricted, expecting header 'triton-secret'" on POST /v2/models/<m>/infer, but returned 200 with a full inference result on POST :8080/models/<m>/invoke. After this change the MME invoke route returns the same 403.

Relates to the SageMaker HTTP-restriction hardening in #8686.

The SageMaker multi-model custom-invoke route (POST /models/{model}/invoke)
is dispatched to SageMakerMMEHandleInfer() without checking the inference
API restriction. When --http-restricted-api=inference is configured, the
standard HTTP infer, the gRPC infer, and the SageMaker /invocations routes
all enforce the required header (the last via the inherited HandleInfer),
but the MME invoke route runs inference without it. This mirrors the MME
model-repository gap closed in triton-inference-server#8686, which did not cover the invoke route.

Add the existing RETURN_AND_RESPOND_IF_RESTRICTED(req,
RestrictedCategory::INFERENCE) guard at the top of the /invoke branch,
matching the base HandleInfer/HandleGenerate handlers and the sibling MME
repository branches. No behavior change when no inference restriction is set.

Signed-off-by: LinZiyuu <linziyu0205@163.com>
@LinZiyuu LinZiyuu force-pushed the fix/sagemaker-mme-invoke-inference-restriction branch from fe4bb70 to 5113a24 Compare May 30, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant