Releases · defilantech/infercost

InferCost v0.1.0

The first release of InferCost: Kubernetes-native cost intelligence for on-premises AI inference.

CostProfile CRD: Declare GPU hardware economics (purchase price, amortization, electricity rate, PUE)
UsageReport CRD: Auto-populated cost reports with per-model/namespace breakdown
Cost engine: Scrapes DCGM for GPU power draw and llama.cpp/vLLM for token counts
12 Prometheus metrics: Cost-per-token, hourly cost, cloud comparison, GPU power, savings
Cloud comparison: Verified pricing for 9 models across OpenAI, Anthropic, and Google
CLI: infercost status and infercost compare with monthly projections
REST API: /api/v1/costs/current, /api/v1/models, /api/v1/compare, /api/v1/status
Grafana dashboard: 13 panels shipped as JSON
Helm chart: Single-command installation with configurable DCGM endpoint and API server
Honest results: Shows when cloud is cheaper than on-prem

Homebrew (macOS):

brew install defilantech/tap/infercost

Helm:

helm install infercost infercost/infercost \
  --set dcgm.endpoint=http://nvidia-dcgm-exporter:9400/metrics

Binary: Download from the assets below.