-
Notifications
You must be signed in to change notification settings - Fork 420
Add regression checks for AI Credits math in Go and JavaScript #38638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -132,4 +132,71 @@ describe("model_costs.cjs", () => { | |
| }); | ||
| expect(aic).toBeGreaterThan(0); | ||
| }); | ||
|
|
||
| it("matches live Copilot billing arithmetic for gpt-5.4 dated model", async () => { | ||
| // Live sample from: | ||
| // https://github.com/github/gh-aw/actions/runs/27355745225/job/80833046748 | ||
| // usage artifact: agent/token_usage.jsonl | ||
| writeModelsFixture({ | ||
| "github-copilot": { | ||
| models: { | ||
| "gpt-5.4": { | ||
| cost: { | ||
| input: "0.0000025", | ||
| output: "0.000015", | ||
| cache_read: "0.00000025", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| const { computeInferenceAIC } = await import("./model_costs.cjs"); | ||
| const samples = [ | ||
| { inputTokens: 20314, outputTokens: 970, wantAIC: 6.5335 }, | ||
| { inputTokens: 24162, outputTokens: 1534, wantAIC: 8.3415 }, | ||
| { inputTokens: 54255, outputTokens: 4411, wantAIC: 20.18025 }, | ||
| { inputTokens: 65750, outputTokens: 224, wantAIC: 16.7735 }, | ||
| ]; | ||
|
|
||
| for (const sample of samples) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [/tdd] A plain 💡 Use it.each() for per-sample test IDsconst samples = [
{ inputTokens: 20314, outputTokens: 970, wantAIC: 6.5335 },
{ inputTokens: 24162, outputTokens: 1534, wantAIC: 8.3415 },
{ inputTokens: 54255, outputTokens: 4411, wantAIC: 20.18025 },
{ inputTokens: 65750, outputTokens: 224, wantAIC: 16.7735 },
];
it.each(samples)(
"gpt-5.4 live: $inputTokens in / $outputTokens out → $wantAIC AIC",
async ({ inputTokens, outputTokens, wantAIC }) => {
const { computeInferenceAIC } = await import("./model_costs.cjs");
const got = computeInferenceAIC({ provider: "copilot", model: "gpt-5.4-2026-03-05",
inputTokens, outputTokens, cacheReadTokens: 0, cacheWriteTokens: 0 });
expect(got).toBeCloseTo(wantAIC, 9);
}
);Each sample becomes a named subtest, surfacing exactly which input data failed. |
||
| const got = computeInferenceAIC({ | ||
| provider: "copilot", | ||
| model: "gpt-5.4-2026-03-05", | ||
| inputTokens: sample.inputTokens, | ||
| outputTokens: sample.outputTokens, | ||
| cacheReadTokens: 0, | ||
| cacheWriteTokens: 0, | ||
| }); | ||
| expect(got).toBeCloseTo(sample.wantAIC, 9); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [/tdd] Precision asymmetry with the Go counterpart: 💡 Align tolerancesChoose one tolerance and use it consistently:
Since the values are exact floating-point arithmetic with the given prices,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inconsistent 💡 DetailsThe two existing
The new test uses
|
||
| } | ||
| }); | ||
|
|
||
| it("includes cache read pricing for copilot gpt-5.4 dated model", async () => { | ||
| writeModelsFixture({ | ||
| "github-copilot": { | ||
| models: { | ||
| "gpt-5.4": { | ||
| cost: { | ||
| input: "0.0000025", | ||
| output: "0.000015", | ||
| cache_read: "0.00000025", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| const { computeInferenceAIC } = await import("./model_costs.cjs"); | ||
| // expected USD: 1000*0.0000025 + 200*0.000015 + 400*0.00000025 = 0.0056 → 0.56 AIC | ||
| const got = computeInferenceAIC({ | ||
| provider: "copilot", | ||
| model: "gpt-5.4-2026-03-05", | ||
| inputTokens: 1000, | ||
| outputTokens: 200, | ||
| cacheReadTokens: 400, | ||
| cacheWriteTokens: 0, | ||
| }); | ||
| expect(got).toBeCloseTo(0.56, 9); | ||
| }); | ||
| }); | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,3 +61,35 @@ func TestComputeModelInferenceAICCopilotAlias(t *testing.T) { | |
| assert.Greater(t, aicViaCopilot, 0.0, "copilot provider alias should produce non-zero AIC") | ||
| assert.InDelta(t, aicViaGitHubCopilot, aicViaCopilot, 1e-9, "copilot and github-copilot should yield identical AIC") | ||
| } | ||
|
|
||
| func TestComputeModelInferenceAICLiveCopilotData(t *testing.T) { | ||
| // Live sample from: | ||
| // https://github.com/github/gh-aw/actions/runs/27355745225/job/80833046748 | ||
| // usage artifact: agent/token_usage.jsonl | ||
| tests := []struct { | ||
| name string | ||
| inputTokens int | ||
| outputTokens int | ||
| wantAIC float64 | ||
| }{ | ||
| {name: "request_1", inputTokens: 20314, outputTokens: 970, wantAIC: 6.5335}, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [/tdd] Test case names 💡 Suggested namingEncode the distinguishing characteristic in the name. For example: {name: "gpt54_20k_input_1k_output", inputTokens: 20314, outputTokens: 970, wantAIC: 6.5335},
{name: "gpt54_24k_input_15k_output", inputTokens: 24162, outputTokens: 1534, wantAIC: 8.3415},
{name: "gpt54_54k_input_4k_output", inputTokens: 54255, outputTokens: 4411, wantAIC: 20.18025},
{name: "gpt54_66k_input_low_output", inputTokens: 65750, outputTokens: 224, wantAIC: 16.7735},This lets a failing test message immediately surface the input range, making it faster to correlate back to the live artifact. |
||
| {name: "request_2", inputTokens: 24162, outputTokens: 1534, wantAIC: 8.3415}, | ||
| {name: "request_3", inputTokens: 54255, outputTokens: 4411, wantAIC: 20.18025}, | ||
| {name: "request_4", inputTokens: 65750, outputTokens: 224, wantAIC: 16.7735}, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| got := computeModelInferenceAIC("copilot", "gpt-5.4-2026-03-05", tt.inputTokens, tt.outputTokens, 0, 0, 0) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [/tdd] Go test relies on the embedded production 💡 Suggestion: use a fixture like the JS testThe JS test uses This also means any future price update to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Go test is not fixture-isolated: the 💡 Why this matters and how to fix itThe JS counterpart writes its own fixture with When Consider using an environment-variable override analogous to // write a temp models.json with the known gpt-5.4 prices
dir := t.TempDir()
modelsPath := filepath.Join(dir, "models.json")
// ... write fixture ...
t.Setenv("GH_AW_MODELS_JSON_PATH", modelsPath)If the Go implementation has no equivalent override, the test comment should at least say which prices it depends on and point to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Implicit prefix-match resolution is not documented and silently breakable: 💡 Details
Minimally, add a comment stating which pricing record is expected ( |
||
| assert.InDelta(t, tt.wantAIC, got, 1e-9) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [/tdd] 💡 Add a failure messageassert.InDelta(t, tt.wantAIC, got, 1e-9,
"AIC mismatch for %s (input=%d, output=%d)", tt.name, tt.inputTokens, tt.outputTokens) |
||
| }) | ||
| } | ||
| } | ||
|
|
||
| func TestComputeModelInferenceAICCopilotCacheRead(t *testing.T) { | ||
| // gpt-5.4 pricing in data/models.json: | ||
| // input=0.0000025, output=0.000015, cache_read=0.00000025 (USD/token) | ||
| // expected USD: 1000*0.0000025 + 200*0.000015 + 400*0.00000025 = 0.0056 → 0.56 AIC | ||
| got := computeModelInferenceAIC("copilot", "gpt-5.4-2026-03-05", 1000, 200, 400, 0, 0) | ||
| assert.InDelta(t, 0.56, got, 1e-9) | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[/tdd] The fixture includes
cache_read: "0.00000025"but every sample passescacheReadTokens: 0— the cache-hit cost path is never exercised. This is a coverage gap in a test explicitly designed as an arithmetic regression.💡 Add a cache-read sample
Derive a wantAIC that includes cache read tokens (e.g., from the live artifact if a cache hit is present, or synthetically with a known token count):
Without this, a bug that zeroes out cache-read billing would pass all four existing assertions.