fix: apply inference restriction to SageMaker MME invoke path#8806
Open
LinZiyuu wants to merge 1 commit into
Open
fix: apply inference restriction to SageMaker MME invoke path#8806LinZiyuu wants to merge 1 commit into
LinZiyuu wants to merge 1 commit into
Conversation
The SageMaker multi-model custom-invoke route (POST /models/{model}/invoke)
is dispatched to SageMakerMMEHandleInfer() without checking the inference
API restriction. When --http-restricted-api=inference is configured, the
standard HTTP infer, the gRPC infer, and the SageMaker /invocations routes
all enforce the required header (the last via the inherited HandleInfer),
but the MME invoke route runs inference without it. This mirrors the MME
model-repository gap closed in triton-inference-server#8686, which did not cover the invoke route.
Add the existing RETURN_AND_RESPOND_IF_RESTRICTED(req,
RestrictedCategory::INFERENCE) guard at the top of the /invoke branch,
matching the base HandleInfer/HandleGenerate handlers and the sibling MME
repository branches. No behavior change when no inference restriction is set.
Signed-off-by: LinZiyuu <linziyu0205@163.com>
fe4bb70 to
5113a24
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR applies the
--http-restricted-api=inferencerestriction to the SageMaker multi-model (MME) custom-invoke route, which is currently the only inference path that does not honor it.Previously,
POST /models/{model}/invokewas dispatched toSageMakerMMEHandleInfer()with no restriction check. When an operator configured aninferenceAPI restriction, the standard HTTP/v2/.../infer, the gRPC infer, and the SageMaker/invocationsroutes all enforced the required header (the last via the inheritedHandleInfer), but the MME invoke route ran inference without it. This is the same alternate-channel gap that #8686 closed for the MME model-repository operations (load/unload/list/get); the invoke route was missed.This change adds the existing
RETURN_AND_RESPOND_IF_RESTRICTED(req, RestrictedCategory::INFERENCE)guard at the top of the/invokebranch, mirroring the baseHandleInfer/HandleGeneratehandlers and the sibling MME repository branches. There is no behavior change for deployments that do not set an inference restriction.Reproduced on
nvcr.io/nvidia/tritonserver:26.04-py3(CPU): with--allow-sagemaker --http-restricted-api=inference:triton-secret=s3cr3tand an MME model loaded, the same inference request with no header returns403 "This API is restricted, expecting header 'triton-secret'"onPOST /v2/models/<m>/infer, but returned200with a full inference result onPOST :8080/models/<m>/invoke. After this change the MME invoke route returns the same403.Relates to the SageMaker HTTP-restriction hardening in #8686.