Dataroom: GCP TPU for jina-embeddings-v5 Inference

Feasibility and cost-efficiency analysis of serving the jina-embeddings-v5 family (v5-text and v5-omni) on Google Cloud TPU instead of the current NVIDIA L4 / Cloud Run deployment. Grounded in real, dated GCP and Jina pricing.

Bottom line: Stay on L4 / Cloud Run. As of 2026-05-05 the feasibility wall fell for v5-text-small (decoder embedding serving shipped upstream, tpu-inference#2420), so the blocker there is now economics, not capability - and the economics still favor L4. v5-text-nano and v5-omni remain feasibility-blocked. See Executive Summary.

Captured 2026-05-28. All prices on-demand USD unless stated; re-verify before any commitment.

#	Section	What it answers
-	Executive Summary	The whole story in one page
01	Models	What v5-text/v5-omni are; current L4 serving
02	Hardware	L4 vs TPU specs; throughput ceiling if software-unblocked
03	Pricing	Real L4 and TPU prices, cost ratios
04	Feasibility	Can it run on TPU at all? (the blocker)
05	Cost Analysis	TPS, token value, margin, break-even
06	Recommendations	The decision and revisit triggers
-	Sources	Raw text captures of every cited fact
-	Figures	Fact-based plots (reproducible via matplotlib)
-	data/	Structured JSON: pricing, models

Reading paths

Decision-maker: Executive Summary -> Decision.
Engineer: Feasibility -> vLLM TPU blocker -> Throughput ceiling: if the gate were removed -> Alternative paths -> Benchmark plan.
Finance / FinOps: Pricing -> Why isn't TPU cost-competitive? -> Break-even model -> Token economics -> cost-model.csv.

Key findings at a glance

Decoder embedding serving on TPU shipped 2026-05-05 (tpu-inference#2420, merged, addresses #899) - StepPool + a vLLM-Pooler-on-JAX-Qwen3 hybrid, verified for Qwen3-Embedding-8B on v6e/v7x. This unblocks v5-text-small at the capability level (remaining: integrate v5 weights + benchmark). v5-text-nano (EuroBERT encoder-only, vllm#20869) and v5-omni (multimodal) stay blocked - #2420 added decoder pooling only.
TPU is not on Cloud Run, so a migration loses min=0 scale-to-zero. For the low-utilization per-task topology this dominates the economics. (Unchanged.)
With real prices, v5e must run >=1.7x an L4's tokens/s (v6e >=3.8x) just to reach tokens-per-dollar parity - now measurable for v5-text-small, still to be benchmarked.
Even if the software gate were removed: on compute-per-dollar (the metric that governs embedding economy) v5e is parity with L4 (0.96x) and only v6e exceeds it (1.99x ceiling, ~1.3-1.5x realistic), and only on a continuously-saturated lane. See throughput ceiling analysis.

More plots (break-even throughput, margin-vs-throughput curves) in figures/ and inline in the cost analysis.

Scope and honesty notes

The cost side uses real, sourced GCP/Jina prices.
The throughput (tokens/s) side is not invented. Where a number is needed to show the shape of the margin curve it is an explicit placeholder in cost-model.csv and must be replaced by measurement (benchmark plan).
Internal architecture facts come from the sefo_gcp embeddings_v5 executor and the website model pages; external facts are captured verbatim in sources/.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
01-models		01-models
02-hardware		02-hardware
03-pricing		03-pricing
04-feasibility		04-feasibility
05-cost-analysis		05-cost-analysis
06-recommendations		06-recommendations
data		data
figures		figures
sources		sources
00-executive-summary.md		00-executive-summary.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dataroom: GCP TPU for jina-embeddings-v5 Inference

Table of Contents

Reading paths

Key findings at a glance

Scope and honesty notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Dataroom: GCP TPU for jina-embeddings-v5 Inference

Table of Contents

Reading paths

Key findings at a glance

Scope and honesty notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages