Skip to content

Commit 69217d9

Browse files
authored
chore(versions): align tutorials examples with 26.06 release train (#151) (#152)
* fix(deps): bump transformers/torch/ray to mitigate CVEs (TRI-1421) Bumps in tutorials/ examples to patched versions: * style: black reformat for Quick_Deploy/PyTorch/export.py Collapses an over-wrapped multi-line expression onto a single line to satisfy black. Picked up by pre-commit run --all-files; not otherwise related to TRI-1421. * chore(versions): bump tutorials examples to 26.06 release Aligns container references and pinned client versions with the 26.06 release (release_version 2.70.0, triton_container_version 26.06, upstream_container_version 26.05 per server/build.py): - nvcr.io/nvidia/tritonserver XX.XX-* -> 26.06-* (covers -py3, -py3-sdk, -vllm-python-py3, -trtllm-python-py3 across Conceptual_Guide, Quick_Deploy, Popular_Models_Guide, Feature_Guide, Deployment/Kubernetes, Triton_Inference_Server_Python_API) - nvcr.io/nvidia/pytorch 23.05-py3 -> 26.05-py3 - nvcr.io/nvidia/tensorflow 24.04-tf2-py3 -> 26.05-tf2-py3 - tritonclient 2.47.0 -> 2.70.0 (kafka-io example) - Internal image tag triton-python-api:r24.08 -> r26.06 in Triton_Inference_Server_Python_API/README.md Placeholder references (yy.mm, xx.yy, <yy.mm>) intentionally left unchanged. Examples that span multiple release trains may need follow-up code adjustments to keep working against the new SDK. * fix(deps): correct tritonclient pin to 2.69.0 (latest on PyPI) The 26.06 server build reports version 2.70.0 but the tritonclient wheel for 2.70.0 has not been published to PyPI yet (latest published is 2.69.0). Pinning the kafka-io example to 2.69.0 so pip install -r requirements.txt resolves today. Bump to 2.70.0 once the wheel ships. * fix(deps): bump sentencepiece pin to 0.2.1 for Python 3.12 wheel The 26.06-py3 tritonserver image ships Python 3.12. sentencepiece 0.1.99 has no cp312 wheel and pip falls back to building from source, which fails inside the container. 0.2.1 has a cp312 wheel and installs cleanly. Verified by building the Dockerfile against the staged tritonserver:26.06-py3 image. * fix(versions): bump 24.08 -> 26.06 in Python_API build.sh / Dockerfile Caught by the staged-image test pass: build.sh hardcoded BASE_IMAGE_TAG_IDENTITY/DIFFUSION, the default TAG slug, and the StableDiffusion sub-build references at 24.08. docker/Dockerfile ARG BASE_IMAGE_TAG also defaulted to 24.08-py3. Aligned all to 26.06. The earlier sweep missed these because the regex looked for nvcr.io / tritonserver: prefixes; the literals lived as bare shell-variable values.
1 parent 4fe2b90 commit 69217d9

23 files changed

Lines changed: 49 additions & 55 deletions

File tree

Conceptual_Guide/Part_4-inference_acceleration/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,10 +135,10 @@ Before proceeding, please set up a model repository for the Text Recognition mod
135135

136136
```
137137
# Server Container
138-
docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:22.11-py3 bash
138+
docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:26.06-py3 bash
139139
140140
# Client Container (on a different terminal)
141-
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:22.11-py3-sdk bash
141+
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:26.06-py3-sdk bash
142142
```
143143

144144
Since this is a model we converted to ONNX, and TensorRT acceleration examples are linked throughout the explanation, we will explore the ONNX pathway. There are three cases to consider with ONNX backend:

Conceptual_Guide/Part_5-Model_Ensembles/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,7 @@ We'll again be launching Triton using docker containers. This time, we'll start
315315
docker run --gpus=all -it --shm-size=1G --rm \
316316
-p8000:8000 -p8001:8001 -p8002:8002 \
317317
-v ${PWD}:/workspace/ -v ${PWD}/model_repository:/models \
318-
nvcr.io/nvidia/tritonserver:22.12-py3
318+
nvcr.io/nvidia/tritonserver:26.06-py3
319319
```
320320

321321
We'll need to install a couple of dependencies for our Python backend scripts.

Deployment/Kubernetes/EKS_Multinode_Triton_TRTLLM/multinode_helm_chart/containers/triton_trt_llm.containerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
ARG BASE_CONTAINER_IMAGE=nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
15+
ARG BASE_CONTAINER_IMAGE=nvcr.io/nvidia/tritonserver:26.06-trtllm-python-py3
1616
FROM ${BASE_CONTAINER_IMAGE}
1717

1818
ENV EFA_INSTALLER_VERSION=1.33.0

Deployment/Kubernetes/EKS_Multinode_Triton_TRTLLM/multinode_helm_chart/gen_ai_perf.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ metadata:
77
spec:
88
containers:
99
- name: triton
10-
image: nvcr.io/nvidia/tritonserver:24.07-py3-sdk
10+
image: nvcr.io/nvidia/tritonserver:26.06-py3-sdk
1111
command: ["sleep", "infinity"]
1212
volumeMounts:
1313
- mountPath: /var/run/models

Deployment/Kubernetes/EKS_Multinode_Triton_TRTLLM/multinode_helm_chart/setup_ssh_efs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ metadata:
77
spec:
88
containers:
99
- name: triton
10-
image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
10+
image: nvcr.io/nvidia/tritonserver:26.06-trtllm-python-py3
1111
command: ["sleep", "infinity"]
1212
resources:
1313
limits:

Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/triton_trt-llm.containerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
ARG BASE_CONTAINER_IMAGE=nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
15+
ARG BASE_CONTAINER_IMAGE=nvcr.io/nvidia/tritonserver:26.06-trtllm-python-py3
1616
ARG ENGINE_DEST_PATH=/var/run/engines
1717
ARG HF_HOME=/var/run/cache
1818

Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/setup_ssh-nfs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ metadata:
77
spec:
88
containers:
99
- name: triton
10-
image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
10+
image: nvcr.io/nvidia/tritonserver:26.06-trtllm-python-py3
1111
command: ["sleep", "infinity"]
1212
resources:
1313
limits:

Deployment/Kubernetes/TensorRT-LLM_Multi-Node_Distributed_Models/containers/triton_trt-llm.containerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
ARG BASE_CONTAINER_IMAGE=nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
15+
ARG BASE_CONTAINER_IMAGE=nvcr.io/nvidia/tritonserver:26.06-trtllm-python-py3
1616
ARG ENGINE_DEST_PATH=/var/run/models/engine
1717
ARG HF_HOME=/var/run/hugging_face
1818
ARG MODEL_DEST_PATH=/var/run/models/model

Feature_Guide/Speculative_Decoding/vLLM/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ To run Draft Model-Based Speculative Decoding with Triton Inference Server, it i
198198
docker run --gpus all -it --net=host --rm -p 8001:8001 --shm-size=1G \
199199
--ulimit memlock=-1 --ulimit stack=67108864 \
200200
-v </path/to/model_repository>:/model_repository \
201-
nvcr.io/nvidia/tritonserver:25.02-vllm-python-py3 \
201+
nvcr.io/nvidia/tritonserver:26.06-vllm-python-py3 \
202202
tritonserver --model-repository /model_repository \
203203
--model-control-mode explicit --load-model opt_model
204204
```

HuggingFace/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ Before the specifics around deploying the models can be discussed, the first ste
115115

116116
```
117117
# Pull the PyTorch Container from NGC
118-
docker run -it --gpus=all -v ${PWD}:/workspace nvcr.io/nvidia/pytorch:23.05-py3
118+
docker run -it --gpus=all -v ${PWD}:/workspace nvcr.io/nvidia/pytorch:26.05-py3
119119
120120
# Install dependencies
121121
pip install transformers

0 commit comments

Comments
 (0)