A Go port of Donald Knuth's Stanford GraphBase (SGB) — a collection of data sets and subroutines for graph algorithms and combinatorics.
The Stanford GraphBase is a library of programs for generating and examining a wide variety of graphs and data structures. This repository is a faithful, idiomatic Go port of the SGB C/CWEB sources.
The GB_FLIP random number generator is ported bit-for-bit, so every generator produces exactly the same graphs as the original C library for the same parameters and seeds — same vertices, same edges, same ID strings.
| Name | Description |
|---|---|
anna.dat |
Anna Karenina (used by gb_books) |
david.dat |
David Copperfield (used by gb_books) |
econ.dat |
US economic input and output (used by gb_econ) |
games.dat |
College football scores, 1990 (used by gb_games) |
homer.dat |
The Iliad (used by gb_books) |
huck.dat |
Huckleberry Finn (used by gb_books) |
jean.dat |
Les Miserables (used by gb_books) |
lisa.dat |
Mona Lisa pixels (used by gb_lisa) |
miles.dat |
Mileage between North American cities (used by gb_miles) |
roget.dat |
Cross references in Roget's Thesaurus (used by gb_roget) |
words.dat |
Five-letter words of English (used by gb_words) |
| Package | SGB Module | Description |
|---|---|---|
gbflip |
GB_FLIP | Subtractive RNG (period |
gbgraph |
GB_GRAPH | Core data structures: Vertex, Arc, Graph
|
gbgbio |
GB_IO | File I/O with checksum validation for .dat files |
gbsort |
GB_SORT | Radix sort utility for linked lists |
| Package | SGB Module | Description |
|---|---|---|
gbbasic |
GB_BASIC | Graph generators and transformers |
gbbooks |
GB_BOOKS | Character-encounter graphs from classic literature |
gbecon |
GB_ECON | U.S. input/output economic flow graph (1985 data) |
gbgames |
GB_GAMES | 1990 college football season game graph |
gbgates |
GB_GATES | Boolean circuit graphs |
gblisa |
GB_LISA | Pixel-intensity graph from the Mona Lisa image |
gbmiles |
GB_MILES | Highway mileage graph between 128 North American cities |
gbplane |
GB_PLANE | Planar graphs via Delaunay triangulation |
gbraman |
GB_RAMAN | Ramanujan expander graphs |
gbrand |
GB_RAND | Random graph generator |
gbroget |
GB_ROGET | Directed graph from Roget's 1879 Thesaurus |
gbwords |
GB_WORDS | Five-letter word graph (one-letter-difference edges) |
| Package | SGB Module | Description |
|---|---|---|
gbdijk |
GB_DIJK | Dijkstra shortest-path with pluggable priority queue |
gbsave |
GB_SAVE | Serialize/deserialize graphs to/from .gb files |
test_sample |
Test routine for GraphBase installation |
Generators read their data files relative to gbio.DataDirectory, so set it
before the first call. The example below builds the five-letter word graph
and finds a shortest word ladder with Dijkstra's algorithm:
package main
import (
"fmt"
"os"
"github.com/sjnam/go-sgb/gb-dijk"
"github.com/sjnam/go-sgb/gb-io"
"github.com/sjnam/go-sgb/gb-words"
)
func main() {
// Tell the generators where the .dat files live.
gbio.DataDirectory = "data/"
// Build the five-letter word graph (all 5757 qualifying words).
g, index, err := gbwords.Words(0, nil, 0, 0)
if err != nil {
panic(err)
}
fmt.Printf("%s: %d vertices, %d edges\n", g.ID, g.N, g.M/2)
// Find the shortest word ladder from "words" to "graph".
start := index.FindWord("words", nil)
goal := index.FindWord("graph", nil)
if d := gbdijk.Dijkstra(start, goal, g, nil, nil, nil); d >= 0 {
gbdijk.PrintDijkstraResult(os.Stdout, goal)
}
}Output:
words(5757,0,0,0): 5757 vertices, 14135 edges
0 words
1 wolds
2 golds
3 goads
4 grads
5 grade
6 grape
7 graph
The demos/ directory contains Go ports of Knuth's demo programs.
Solves the assignment problem on a matrix of Mona Lisa pixel brightnesses from
gblisa: it picks at most one entry per row and column to maximize their sum,
using the Hungarian algorithm, and reports the mem count. With -P it writes an
encapsulated PostScript file lisa.eps visualizing the chosen pixels.
$ go run ./demos/assign_lisa/ -s -p
Assignment problem for lisa(16,32,255,94,110,97,129,0,100000)
171 172 169 177 186 191 169 99 78 ...
⋮
The following entries produce an optimum assignment:
[0,26]
⋮
Solved in 13529 mems.
Parameters are name=value (m, n, d, m0, m1, n0, n1, d0, d1).
Flags: -s (16×32 smile), -e (20×50 eyes), -c (complement/minimize),
-h (square-matrix heuristic), -v/-V (verbose), -p (print matrix and
solution), -P (write lisa.eps), -DDIR (data directory).
Computes biconnected components of character-encounter graphs from classic literature using the Hopcroft–Tarjan algorithm.
go run ./demos/book_components/ [-tTITLE] [-nN] [-xN] [-fN] [-lN] [-iN] [-oN] [-sN] [-v] [-gFILE] [-dDIR]
TITLE is one of anna (default), david, jean, huck, or homer.
Permutes the sectors of the gbecon input/output matrix into a near-triangular
order (primary-material producers first, final-product industries last) by
minimizing the below-diagonal "feed-forward" sum with local search. The default
is Gleason's cautious descent; -g uses greedy steepest descent. Both reproduce
Knuth's published figures exactly.
$ go run ./demos/econ_order/ -g
Ordering the sectors of econ(79,2,0,0), using seed 0:
(Steepest descent method)
(The amount of feed-forward must be at least 321656.)
Local minimum feed-forward is 457408, found after 93 steps.
The corresponding economic order is:
⋮
Options: -nN (sectors, default 79), -rN (N random restarts), -sN (econ
seed), -tN (permutation seed), -g (greedy), -v/-V (verbose), -DDIR
(data directory).
Finds long chains of college football scores to "prove" one team outranks
another. With no argument a simple greedy algorithm is used; football N
keeps the N best partial chains per stratum of a stratified heuristic and
tends to find much longer chains — Stanford over Harvard by 781 points with
the greedy algorithm, 1895 with width 10, 2185 with width 4000.
$ go run ./demos/football/ [searchwidth]
Starting team: Stanford
Other team: Harvard
Sep 22: Stanford Cardinal 37, Oregon State Beavers 3 (+34)
Oct 13: Oregon State Beavers 35, Arizona Wildcats 21 (+48)
⋮
Nov 17: Yale Bulldogs 34, Harvard Crimson 19 (+781)
Computes the girth (shortest cycle) and diameter of the Ramanujan graphs built
by gbraman. It prompts for two distinct primes p and q, prints the
theoretical bounds the theory predicts, then finds the exact values by
breadth-first search. For example, raman(2,43) has 79464 vertices, girth 20,
and diameter 22.
$ go run ./demos/girth/
Choose a branching factor, p: 2
OK, now choose the cube root of graph size, q: 43
The graph has 79464 vertices, each of degree 3, and it is bipartite.
Any such graph must have diameter >= 15 and girth <= 30;
theoretical considerations tell us that this one's diameter is <= 34.
Starting at any given vertex, there are
3 vertices at distance 1,
⋮
0 vertices at distance 23.
So the diameter is 22, and the girth is 20.
Enter p = 2 only with q = 17 or 43; an empty line or EOF exits.
Finds shortest word ladders between five-letter English words using Dijkstra's algorithm.
$ go run ./demos/ladders/
Starting word: chaos
Goal word: order
0 chaos
1 choos
2 chops
⋮
11 odder
12 order
Options: -v (trace the search), -a (alphabetic distance), -f
(frequency-based distance), -h (A*-style lower-bound heuristic),
-e (echo input), -nN (N most common words), -rN (N random words),
-sN (random seed), -dDIR (data directory).
Finds the minimum spanning tree of a highway-mileage graph with four classic algorithms and reports how many "mems" (memory references) each one consumes — a machine-independent measure of efficiency. The four are Kruskal's algorithm (radix sort + union/find), Jarník–Prim with a binary heap, Jarník–Prim with a Fibonacci heap, and Cheriton–Tarjan–Karp with binomial queues. The mem counts reproduce Knuth's published figures exactly.
$ go run ./demos/miles_span/
The graph miles(100,0,0,0,0,10,0) has 405 edges,
and its minimum spanning tree has length 14467.
The Kruskal/radix-sort algorithm takes 8379 mems;
the Jarnik/Prim/binary-heap algorithm takes 7972 mems;
the Jarnik/Prim/Fibonacci-heap algorithm takes 11736 mems;
the Cheriton/Tarjan/Karp algorithm takes 17770 mems.
Options: -nN (N cities, max 128), -NN/-WN/-PN (north/west/population
weights), -dN (max degree), -rN (investigate N graphs with consecutive
seeds), -sN (random seed), -v (report each tree edge), -gFILE (restore an
external graph), -DDIR (data directory).
Multiplies numbers the slow way — by simulating, one gate at a time, the
multiplication circuit that gbgates builds with prod. Given multiply m n
it prompts for two numbers and multiplies them; given a third seed argument it
specializes the circuit to multiply any m-bit number by one fixed random n-bit
constant. An empty line ends the session.
$ go run ./demos/multiply/ 16 16
Here I am, ready to multiply 16-bit numbers by 16-bit numbers.
(I'm simulating a logic circuit with 1781 gates, depth 25.)
Number, please? 65535
Another? 65535
65535x65535=4294836225.
Builds the graph of queen moves on a 3×4 board — the union of rook moves
(board with -1) and bishop moves (board with -2) via gunion — then
prints every vertex and arc and saves the result with gbsave. This is the
Go port of the example program in the Stanford GraphBase introduction
(queen.w and queen_wrap.w in sources/). With -w the second coordinate
wraps around, giving queen moves on a cylinder, and the graph is saved as
queen_wrap.gb instead of queen.gb.
$ go run ./demos/queen/
Queen Moves on a 3x4 Board
The graph whose official name is
gunion(board(3,4,0,0,-1,0,0),board(3,4,0,0,-2,0,0),0,0)
has 12 vertices and 92 arcs:
0.0
-> 1.1, length 1
-> 2.2, length 2
⋮
Computes strongly connected components of the Roget thesaurus graph using Tarjan's iterative depth-first-search algorithm. Components are printed in reverse topological order.
go run ./demos/roget_components/ [-nN] [-dN] [-pN] [-sN] [-gFILE] [-DDIR]
Multiplies and divides small numbers by simulating the simple RISC machine that
gbgates builds with risc — loading a tiny 34-instruction program into its
read-only memory and running the gate network cycle by cycle. It prompts for two
positive numbers (each at most 0x7fff) and reports their product, quotient, and
remainder. Any command-line argument turns on a per-cycle register trace.
$ go run ./demos/take_risc/
Welcome to the world of microRISC.
Gimme a number: 100
OK, now gimme another: 7
The product of 100 and 7 is 700.
The quotient is 14, and the remainder is 2.
Computes connected components of the five-letter word graph, printing statistics as each vertex is added (union-find algorithm).
go run ./demos/word_components/ [-dDIR]
Knuth's installation/portability check: it runs most of the generators and
prints salient characteristics of each graph (and writes a saved graph to
test.gb), to be diffed against a golden file. Run it from the repository root
so the generators find their .dat files.
go run ./demos/test_sample/
Its output matches sources/sample.correct except for a few documented spots
(raman's bool reduce parameter, the omitted stray-vertex pointer hack, and one
library divergence it still surfaces in partial_gates).
The data/ directory contains the .dat files required by the graph
generators:
words.dat— five-letter English words with frequency dataroget.dat— Roget's Thesaurus cross-referencesanna.dat,david.dat,jean.dat,huck.dat,homer.dat— literary character datagames.dat— 1990 college football scoresmiles.dat— North American city mileage dataecon.dat— U.S. economic input/output datalisa.dat— Mona Lisa pixel data
Generators look for these files in the current directory first, then under
gbio.DataDirectory. Every file carries line counts and checksums that are
verified on read.
The port keeps Knuth's algorithms and data layouts intact while replacing C idioms with Go ones:
- Errors instead of panic codes. The C library reports failures through
a global
panic_code; here every generator returns(*graph.Graph, error)with sentinel errors (graph.ErrBadSpecs,graph.ErrSyntaxError, …) that work witherrors.Is. - Auxiliary data is returned, not stored in globals.
words.Wordsreturns a*words.Indexfor word lookup,miles.Milesreturns a*miles.DistanceMatrix,books.Bookreturns the chapter-name list, andgates.RunRiscreturns the machine state. Calls are independent and reentrant. - Boolean parameters are
bool. C'slongflags (directed,self,copy, …) becomebool; integers remain only for bitmasks, enums, and genuinely ternary values. Graph ID strings still render flags as0/1, so IDs match the C originals exactly. - Utility fields. The C
utilunions becomeany-typed fields (Vertex.U–Z,Arc.A/B,Graph.UU–ZZ); each package provides typed accessor functions, and the 14-characterUtilTypesconvention is preserved forsave/restore. - Memory management is the garbage collector's job.
gb_recycleand friends are no-ops; there is noAreabookkeeping. - Diagnostic output goes to an
io.Writer. Functions likedijk.Dijkstra(trace),dijk.PrintDijkstraResult, andgates.PrintGatestake a writer instead of printing to stdout. - GB_IO is package
gbioto avoid clashing with the standard library'siopackage.
The original CWEB sources (© 1993 Stanford University) are included in the
sources/ directory for reference and comparison.
go test ./...The tests exercise every module against invariants documented in the CWEB sources: the word graph has exactly 5757 vertices, Anna Karenina has 239 chapters, the mileage data satisfies the triangle inequality, the football demo reproduces Knuth's published chain totals, and save/restore round-trips preserve graphs exactly.
- Go 1.26 or later
git clone https://github.com/sjnam/go-sgb
cd go-sgb
go build ./...- Donald E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, ACM Press, 1993.
- Source: https://www-cs-faculty.stanford.edu/~knuth/sgb.html