-
Notifications
You must be signed in to change notification settings - Fork 283
Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters #4441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shamika
wants to merge
11
commits into
GoogleCloudPlatform:main
Choose a base branch
from
shamika:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 6 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ca2a055
VLLM deployment with LORA adopters added
shamika c989979
VLLM deployment with LORA adopters added
shamika e1cc2b4
VLLM deployment with LORA adopters added
shamika 12cd1c7
Service Axxount configuration added.
shamika d54a597
Service Account config added
shamika a88234d
Added Code Owner for new notebooks
shamika 4ea02d3
Update notebooks/official/prediction/vertexai_serving_vllm/custom_con…
shamika f8b0784
Review comments fixed
shamika 9545b14
Linting Errors Fixed
shamika 2b22077
Update VLLM_DOCKER_URI to new inference image
shamika f7841e3
Fix VLLM_DOCKER_URI assignment in notebook
shamika File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 33 additions & 0 deletions
33
notebooks/official/prediction/vertexai_serving_vllm/custom_container/Dockerfile
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| # Copyright 2025 Google LLC | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # https://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| ARG BASE_IMAGE | ||
| FROM ${BASE_IMAGE} | ||
|
|
||
| ENV DEBIAN_FRONTEND=noninteractive | ||
| # Install gcloud SDK | ||
| RUN apt-get update && \ | ||
| apt-get install -y apt-utils git apt-transport-https gnupg ca-certificates curl \ | ||
| && echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list \ | ||
| && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg \ | ||
| && apt-get update -y && apt-get install google-cloud-cli -y \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| WORKDIR /workspace/vllm | ||
|
|
||
| # Copy entrypoint.sh to the container | ||
| COPY ./entrypoint.sh /workspace/vllm/vertexai/entrypoint.sh | ||
| RUN chmod +x /workspace/vllm/vertexai/entrypoint.sh | ||
|
|
||
| ENTRYPOINT ["/workspace/vllm/vertexai/entrypoint.sh"] |
63 changes: 63 additions & 0 deletions
63
notebooks/official/prediction/vertexai_serving_vllm/custom_container/ReadMe.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| Architecture Overview | ||
|
|
||
| The custom container consists of three main components: | ||
|
|
||
| 1. Dockerfile (Dockerfile) | ||
|
|
||
| Builds a custom container image that: | ||
| - Takes a base image (vLLM) via build argument | ||
| - Installs Google Cloud SDK to enable GCS (Google Cloud Storage) access | ||
| - Copies and configures the entrypoint script | ||
| - Sets the entrypoint to /workspace/vllm/vertexai/entrypoint.sh | ||
|
|
||
| 2. Entrypoint Script (entrypoint.sh:1-54) | ||
|
|
||
| This is the key component that provides smart model loading: | ||
|
|
||
| What it does: | ||
| - Intercepts all command-line arguments passed to the container | ||
| - Detects if the --model= argument points to a GCS path (starts with gs://) | ||
| - If it's a GCS path: | ||
| - Downloads the model from GCS to /tmp/model_dir using gcloud storage cp | ||
| - Rewrites the --model= argument to point to the local directory | ||
| - If it's a local path, passes it through unchanged | ||
| - Executes the original command with updated arguments | ||
|
|
||
| Key logic (lines 37-51): | ||
| for a in "$@"; do | ||
| if [[ $a == $model_arg* ]]; then # Detects --model= | ||
| model_path=${a#*=} | ||
| if [[ $model_path == $gcs_protocol* ]]; then # Is it gs://? | ||
| download_model_from_gcs $model_path | ||
| updated_args+=("--model=${LOCAL_MODEL_DIR}") # Use local path | ||
| else | ||
| updated_args+=("--model=${model_path}") | ||
| fi | ||
| else | ||
| updated_args+=("$a") | ||
| fi | ||
| done | ||
|
|
||
| 3. Cloud Build Configuration (cloudbuild.yaml:15-37) | ||
|
|
||
| Automates the container build and push process: | ||
|
|
||
| Build steps: | ||
| 1. If _DEVICE_TYPE=cpu: Clones vLLM repo and builds the CPU base image | ||
| 2. Builds the custom container with the entrypoint on top of the base image | ||
| 3. Pushes to Artifact Registry at $LOCATION-docker.pkg.dev/$PROJECT_ID/${_REPOSITORY}/vllm-${_DEVICE_TYPE} | ||
|
|
||
| Configurable substitutions: | ||
| - _DEVICE_TYPE: gpu (default) or cpu | ||
| - _BASE_IMAGE: vllm/vllm-openai (default) | ||
| - _REPOSITORY: my-docker-repo (default) | ||
|
|
||
| How It Works End-to-End | ||
|
|
||
| 1. Build: Cloud Build creates a container with gcloud SDK + entrypoint script | ||
| 2. Deploy: Container deployed to Vertex AI | ||
| 3. Runtime: When Vertex AI starts the container with arguments like --model=gs://my-bucket/my-model | ||
| 4. Entrypoint intercepts: Downloads the model from GCS to local disk | ||
| 5. vLLM starts: With the model loaded from local path instead of GCS | ||
|
|
||
| This design allows you to store models in GCS and have them automatically downloaded at container startup, rather than baking them into the container image. | ||
37 changes: 37 additions & 0 deletions
37
notebooks/official/prediction/vertexai_serving_vllm/custom_container/cloudbuild.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| # Copyright 2025 Google LLC | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # https://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| steps: | ||
| - name: 'gcr.io/cloud-builders/docker' | ||
| automapSubstitutions: true | ||
| script: | | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
| device_type_param=${_DEVICE_TYPE} | ||
| device_type=${device_type_param,,} | ||
| base_image=${_BASE_IMAGE} | ||
| image_name="vllm-${_DEVICE_TYPE}" | ||
| if [[ $device_type == "cpu" ]]; then | ||
| echo "Quietly building open source vLLM CPU container image" | ||
| git clone https://github.com/vllm-project/vllm.git | ||
|
shamika marked this conversation as resolved.
Outdated
|
||
| cd vllm && DOCKER_BUILDKIT=1 docker build -t $base_image -f docker/Dockerfile.cpu . -q | ||
| cd .. | ||
| fi | ||
| echo "Quietly building container image for: $device_type" | ||
| docker build -t $LOCATION-docker.pkg.dev/$PROJECT_ID/${_REPOSITORY}/$image_name --build-arg BASE_IMAGE=$base_image . -q | ||
| docker push $LOCATION-docker.pkg.dev/$PROJECT_ID/${_REPOSITORY}/$image_name | ||
| substitutions: | ||
| _DEVICE_TYPE: gpu | ||
| _BASE_IMAGE: vllm/vllm-openai | ||
| _REPOSITORY: my-docker-repo | ||
91 changes: 91 additions & 0 deletions
91
notebooks/official/prediction/vertexai_serving_vllm/custom_container/entrypoint.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Copyright 2025 Google LLC | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # https://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| readonly LOCAL_MODEL_DIR=${LOCAL_MODEL_DIR:-"/tmp/model_dir"} | ||
| readonly LOCAL_LORA_DIR=${LOCAL_LORA_DIR:-"/tmp/lora_adapters"} | ||
|
|
||
| download_model_from_gcs() { | ||
| gcs_uri=$1 | ||
|
shamika marked this conversation as resolved.
Outdated
|
||
| mkdir -p $LOCAL_MODEL_DIR | ||
| echo "Downloading model from $gcs_uri to local directory..." | ||
| if gcloud storage cp -r "$gcs_uri/*" "$LOCAL_MODEL_DIR"; then | ||
| echo "Model downloaded successfully to ${LOCAL_MODEL_DIR}." | ||
| else | ||
| echo "Failed to download model from Cloud Storage: $gcs_uri." >&2 | ||
| exit 1 | ||
| fi | ||
| } | ||
|
|
||
|
|
||
| download_lora_adapters_from_gcs() { | ||
| gcs_uri=$1 | ||
| local_dir=$2 | ||
|
shamika marked this conversation as resolved.
Outdated
|
||
| mkdir -p "$local_dir" | ||
| echo "Downloading LoRA adapters from $gcs_uri to $local_dir..." | ||
| if gcloud storage cp -r "$gcs_uri/*" "$local_dir"; then | ||
| echo "LoRA adapters downloaded successfully to ${local_dir}." | ||
| else | ||
| echo "Failed to download LoRA adapters from Cloud Storage: $gcs_uri." >&2 | ||
| exit 1 | ||
| fi | ||
| } | ||
|
|
||
| updated_args=() | ||
| model_arg="--model=" | ||
| lora_modules_arg="--lora-modules=" | ||
| gcs_protocol="gs://" | ||
| lora_counter=0 | ||
|
|
||
| for a in "$@"; do | ||
| if [[ $a == $model_arg* ]]; then | ||
| model_path=${a#*=} | ||
| echo "Found model: $model_path" | ||
| if [[ $model_path == $gcs_protocol* ]]; then | ||
| download_model_from_gcs $model_path | ||
| updated_args+=("--model=${LOCAL_MODEL_DIR}") | ||
| else | ||
| updated_args+=("--model=${model_path}") | ||
| fi | ||
| elif [[ $a == $lora_modules_arg* ]]; then | ||
| lora_spec=${a#*=} | ||
| echo "Found LoRA module: $lora_spec" | ||
|
|
||
| # LoRA modules can be in format "name=path" or just "path" | ||
| if [[ $lora_spec == *"="* ]]; then | ||
| lora_name=${lora_spec%%=*} | ||
| lora_path=${lora_spec#*=} | ||
| else | ||
| lora_name="lora_$lora_counter" | ||
| lora_path=$lora_spec | ||
| ((lora_counter++)) | ||
| fi | ||
|
|
||
| if [[ $lora_path == $gcs_protocol* ]]; then | ||
| local_lora_path="${LOCAL_LORA_DIR}/${lora_name}" | ||
| download_lora_adapters_from_gcs "$lora_path" "$local_lora_path" | ||
| updated_args+=("--lora-modules=${lora_name}=${local_lora_path}") | ||
| else | ||
| updated_args+=("$a") | ||
| fi | ||
| else | ||
| updated_args+=("$a") | ||
| fi | ||
| done | ||
|
|
||
| echo "Launch command: " "${updated_args[@]}" | ||
| exec "${updated_args[@]}" | ||
142 changes: 142 additions & 0 deletions
142
notebooks/official/prediction/vertexai_serving_vllm/custom_container/local_build.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # Copyright 2025 Google LLC | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # https://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| # Default configurations (same as cloudbuild.yaml substitutions) | ||
| DEVICE_TYPE=${DEVICE_TYPE:-gpu} | ||
| BASE_IMAGE=${BASE_IMAGE:-vllm/vllm-openai} | ||
| REPOSITORY=${REPOSITORY:-my-docker-repo} | ||
| PROJECT_ID=${PROJECT_ID:-} | ||
| LOCATION=${LOCATION:-us-central1} | ||
|
|
||
| # Parse command line arguments | ||
| while [[ $# -gt 0 ]]; do | ||
| case $1 in | ||
| --device-type) | ||
| DEVICE_TYPE="$2" | ||
| shift 2 | ||
| ;; | ||
| --base-image) | ||
| BASE_IMAGE="$2" | ||
| shift 2 | ||
| ;; | ||
| --repository) | ||
| REPOSITORY="$2" | ||
| shift 2 | ||
| ;; | ||
| --project-id) | ||
| PROJECT_ID="$2" | ||
| shift 2 | ||
| ;; | ||
| --location) | ||
| LOCATION="$2" | ||
| shift 2 | ||
| ;; | ||
| --help) | ||
| echo "Usage: $0 [OPTIONS]" | ||
| echo "" | ||
| echo "Options:" | ||
| echo " --device-type Device type: gpu or cpu (default: gpu)" | ||
| echo " --base-image Base Docker image (default: vllm/vllm-openai)" | ||
| echo " --repository Artifact Registry repository (default: my-docker-repo)" | ||
| echo " --project-id GCP Project ID (required for push)" | ||
| echo " --location GCP location (default: us-central1)" | ||
| echo " --help Show this help message" | ||
| echo "" | ||
| echo "Examples:" | ||
| echo " # Build GPU container locally" | ||
| echo " $0 --device-type gpu" | ||
| echo "" | ||
| echo " # Build CPU container locally" | ||
| echo " $0 --device-type cpu" | ||
| echo "" | ||
| echo " # Build and push to Artifact Registry" | ||
| echo " $0 --device-type gpu --project-id my-project --location us-central1" | ||
| exit 0 | ||
| ;; | ||
| *) | ||
| echo "Unknown option: $1" | ||
| echo "Use --help for usage information" | ||
| exit 1 | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| # Convert device type to lowercase | ||
| device_type=${DEVICE_TYPE,,} | ||
| image_name="vllm-${DEVICE_TYPE}" | ||
|
|
||
| echo "=========================================" | ||
| echo "Local Docker Build Configuration" | ||
| echo "=========================================" | ||
| echo "Device Type: $device_type" | ||
| echo "Base Image: $BASE_IMAGE" | ||
| echo "Image Name: $image_name" | ||
| if [[ -n "$PROJECT_ID" ]]; then | ||
| echo "Target: $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$image_name" | ||
| else | ||
| echo "Target: $image_name (local only)" | ||
| fi | ||
| echo "=========================================" | ||
| echo "" | ||
|
|
||
| # Handle CPU build - clone and build vLLM base image if needed | ||
| if [[ $device_type == "cpu" ]]; then | ||
| echo "Building open source vLLM CPU container image..." | ||
| if [[ ! -d "vllm" ]]; then | ||
| echo "Cloning vLLM repository..." | ||
| git clone https://github.com/vllm-project/vllm.git | ||
|
shamika marked this conversation as resolved.
Outdated
|
||
| else | ||
| echo "vLLM directory already exists, using existing clone" | ||
| fi | ||
|
|
||
| cd vllm | ||
| echo "Building vLLM CPU base image..." | ||
| DOCKER_BUILDKIT=1 docker build -t "$BASE_IMAGE" -f docker/Dockerfile.cpu . | ||
| cd .. | ||
| echo "vLLM CPU base image built successfully" | ||
| echo "" | ||
| fi | ||
|
|
||
| # Build the custom container | ||
| echo "Building custom container image for: $device_type" | ||
| docker build -t "$image_name" --build-arg BASE_IMAGE="$BASE_IMAGE" . | ||
|
|
||
| echo "" | ||
| echo "Build completed successfully!" | ||
| echo "Local image tag: $image_name" | ||
|
|
||
| # Optionally push to Artifact Registry | ||
| if [[ -n "$PROJECT_ID" ]]; then | ||
| remote_tag="$LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$image_name" | ||
| echo "" | ||
| echo "Tagging image for Artifact Registry..." | ||
| docker tag "$image_name" "$remote_tag" | ||
|
|
||
| echo "Pushing to Artifact Registry: $remote_tag" | ||
| docker push "$remote_tag" | ||
| echo "" | ||
| echo "Push completed successfully!" | ||
| echo "Remote image: $remote_tag" | ||
| else | ||
| echo "" | ||
| echo "Skipping push (no PROJECT_ID provided)" | ||
| echo "To push to Artifact Registry, run with --project-id option" | ||
| fi | ||
|
|
||
| echo "" | ||
| echo "Done!" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.