Skip to content

Commit 17c2b8d

Browse files
aryguptclaude
andcommitted
fix(db): exclude eval_samples/server_logs from dump to fit GitHub 2 GiB cap
The weekly public DB dump is published as a GitHub release asset, which is hard-capped at 2 GiB. eval_samples (~1.7 GB compressed) + server_logs (~345 MB) make up ~99% of the archive, pushing it past the cap — every dump since 2026-05-18 has failed with `size must be less than 2147483648`. Exclude both tables from the dump by default (set DUMP_INCLUDE_ALL=1 for a full backup). This drops the zip from ~2.07 GB to ~0.36 GB and unblocks the weekly release. The analytically useful tables are unaffected (benchmark_results is only ~20 MB). load-dump already skips missing table files, so restores round-trip cleanly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 0c91e4b commit 17c2b8d

1 file changed

Lines changed: 23 additions & 1 deletion

File tree

packages/db/src/dump-db.ts

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
import { createWriteStream, mkdirSync } from 'node:fs';
1010
import { resolve } from 'node:path';
1111

12-
import { TABLE_INSERT_ORDER } from '@semianalysisai/inferencex-constants';
12+
import { TABLE_INSERT_ORDER, TABLE_NAMES } from '@semianalysisai/inferencex-constants';
1313

1414
import { hasNoSslFlag } from './cli-utils';
1515
import { createAdminSql } from './etl/db-utils';
@@ -18,6 +18,24 @@ const sql = createAdminSql({ noSsl: hasNoSslFlag(), readonly: true, max: 1 });
1818

1919
const CURSOR_BATCH = 100;
2020

21+
/**
22+
* Tables excluded from the dump by default.
23+
*
24+
* The weekly public dump is published as a GitHub release asset, which is
25+
* hard-capped at 2 GiB (2_147_483_648 bytes). These two tables dominate the
26+
* archive — eval_samples (~1.7 GB compressed) + server_logs (~345 MB) are
27+
* ~99% of the zip — while the analytically useful tables are tiny
28+
* (benchmark_results is only ~20 MB). Including them pushed the archive past
29+
* the cap, so every dump since 2026-05-18 failed with
30+
* `size must be less than 2147483648`. Excluding them drops the zip from
31+
* ~2.07 GB to ~0.36 GB and unblocks the weekly release.
32+
*
33+
* Set DUMP_INCLUDE_ALL=1 for a complete backup (e.g. when writing somewhere
34+
* without the 2 GiB asset limit).
35+
*/
36+
const DEFAULT_SKIP = new Set<string>([TABLE_NAMES.evalSamples, TABLE_NAMES.serverLogs]);
37+
const SKIP = process.env.DUMP_INCLUDE_ALL === '1' ? new Set<string>() : DEFAULT_SKIP;
38+
2139
/** Stream a table to a JSON file using a cursor, writing row-by-row. */
2240
async function streamTable(table: string, outPath: string): Promise<number> {
2341
const out = createWriteStream(outPath);
@@ -54,6 +72,10 @@ async function dump(): Promise<void> {
5472
console.log(` Output: ${outDir}\n`);
5573

5674
for (const table of TABLE_INSERT_ORDER) {
75+
if (SKIP.has(table)) {
76+
console.log(` ${table}... skipped (excluded from dump; set DUMP_INCLUDE_ALL=1 to include)`);
77+
continue;
78+
}
5779
process.stdout.write(` ${table}...`);
5880
const outPath = resolve(outDir, `${table}.json`);
5981
const count = await streamTable(table, outPath);

0 commit comments

Comments
 (0)