vLLM model plugin for Google's T5Gemma2 encoder-decoder model family.
Supports google/t5gemma-2-270m-270m, google/t5gemma-2-1b-1b, and google/t5gemma-2-4b-4b.
pip install -e .Requires:
- vLLM >= 0.19.0
- transformers (latest, with T5Gemma2 support)
# Serve
vllm serve google/t5gemma-2-270m-270m --max-model-len 4096
# Generate
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "google/t5gemma-2-270m-270m",
"prompt": "Translate English to French: Hello, how are you?",
"max_tokens": 50,
"temperature": 0.0
}'python verify_plugin.py- vLLM PR #32617 — original implementation by @akh64bit
- vllm-project/bart-plugin — reference plugin architecture
- T5Gemma2 on HuggingFace