Skip to content

sjnam/go-sgb

Repository files navigation

go-sgb

A Go port of Donald Knuth's Stanford GraphBase (SGB) — a collection of data sets and subroutines for graph algorithms and combinatorics.

Overview

The Stanford GraphBase is a library of programs for generating and examining a wide variety of graphs and data structures. This repository is a faithful, idiomatic Go port of the SGB C/CWEB sources.

The GB_FLIP random number generator is ported bit-for-bit, so every generator produces exactly the same graphs as the original C library for the same parameters and seeds — same vertices, same edges, same ID strings.

Packages

Data files

Name Description
anna.dat Anna Karenina (used by gb_books)
david.dat David Copperfield (used by gb_books)
econ.dat US economic input and output (used by gb_econ)
games.dat College football scores, 1990 (used by gb_games)
homer.dat The Iliad (used by gb_books)
huck.dat Huckleberry Finn (used by gb_books)
jean.dat Les Miserables (used by gb_books)
lisa.dat Mona Lisa pixels (used by gb_lisa)
miles.dat Mileage between North American cities (used by gb_miles)
roget.dat Cross references in Roget's Thesaurus (used by gb_roget)
words.dat Five-letter words of English (used by gb_words)

Kernel routines

Package SGB Module Description
gbflip GB_FLIP Subtractive RNG (period $2^{85}−2^{30}$)
gbgraph GB_GRAPH Core data structures: Vertex, Arc, Graph
gbgbio GB_IO File I/O with checksum validation for .dat files
gbsort GB_SORT Radix sort utility for linked lists

Graph Generating routines

Package SGB Module Description
gbbasic GB_BASIC Graph generators and transformers
gbbooks GB_BOOKS Character-encounter graphs from classic literature
gbecon GB_ECON U.S. input/output economic flow graph (1985 data)
gbgames GB_GAMES 1990 college football season game graph
gbgates GB_GATES Boolean circuit graphs
gblisa GB_LISA Pixel-intensity graph from the Mona Lisa image
gbmiles GB_MILES Highway mileage graph between 128 North American cities
gbplane GB_PLANE Planar graphs via Delaunay triangulation
gbraman GB_RAMAN Ramanujan expander graphs
gbrand GB_RAND Random graph generator
gbroget GB_ROGET Directed graph from Roget's 1879 Thesaurus
gbwords GB_WORDS Five-letter word graph (one-letter-difference edges)

Miscellaneous routines

Package SGB Module Description
gbdijk GB_DIJK Dijkstra shortest-path with pluggable priority queue
gbsave GB_SAVE Serialize/deserialize graphs to/from .gb files
test_sample Test routine for GraphBase installation

Quick Start

Generators read their data files relative to gbio.DataDirectory, so set it before the first call. The example below builds the five-letter word graph and finds a shortest word ladder with Dijkstra's algorithm:

package main

import (
    "fmt"
    "os"

    "github.com/sjnam/go-sgb/gb-dijk"
    "github.com/sjnam/go-sgb/gb-io"
    "github.com/sjnam/go-sgb/gb-words"
)

func main() {
    // Tell the generators where the .dat files live.
    gbio.DataDirectory = "data/"

    // Build the five-letter word graph (all 5757 qualifying words).
    g, index, err := gbwords.Words(0, nil, 0, 0)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%s: %d vertices, %d edges\n", g.ID, g.N, g.M/2)

    // Find the shortest word ladder from "words" to "graph".
    start := index.FindWord("words", nil)
    goal := index.FindWord("graph", nil)
    if d := gbdijk.Dijkstra(start, goal, g, nil, nil, nil); d >= 0 {
        gbdijk.PrintDijkstraResult(os.Stdout, goal)
    }
}

Output:

words(5757,0,0,0): 5757 vertices, 14135 edges
         0 words
         1 wolds
         2 golds
         3 goads
         4 grads
         5 grade
         6 grape
         7 graph

Demos

The demos/ directory contains Go ports of Knuth's demo programs.

assign_lisa

Solves the assignment problem on a matrix of Mona Lisa pixel brightnesses from gblisa: it picks at most one entry per row and column to maximize their sum, using the Hungarian algorithm, and reports the mem count. With -P it writes an encapsulated PostScript file lisa.eps visualizing the chosen pixels.

$ go run ./demos/assign_lisa/ -s -p
Assignment problem for lisa(16,32,255,94,110,97,129,0,100000)
 171 172 169 177 186 191 169  99  78  ...
 ⋮
The following entries produce an optimum assignment:
 [0,26]
 ⋮
Solved in 13529 mems.

Parameters are name=value (m, n, d, m0, m1, n0, n1, d0, d1). Flags: -s (16×32 smile), -e (20×50 eyes), -c (complement/minimize), -h (square-matrix heuristic), -v/-V (verbose), -p (print matrix and solution), -P (write lisa.eps), -DDIR (data directory).

book_components

Computes biconnected components of character-encounter graphs from classic literature using the Hopcroft–Tarjan algorithm.

go run ./demos/book_components/ [-tTITLE] [-nN] [-xN] [-fN] [-lN] [-iN] [-oN] [-sN] [-v] [-gFILE] [-dDIR]

TITLE is one of anna (default), david, jean, huck, or homer.

econ_order

Permutes the sectors of the gbecon input/output matrix into a near-triangular order (primary-material producers first, final-product industries last) by minimizing the below-diagonal "feed-forward" sum with local search. The default is Gleason's cautious descent; -g uses greedy steepest descent. Both reproduce Knuth's published figures exactly.

$ go run ./demos/econ_order/ -g
Ordering the sectors of econ(79,2,0,0), using seed 0:
 (Steepest descent method)
(The amount of feed-forward must be at least 321656.)
Local minimum feed-forward is 457408, found after 93 steps.
The corresponding economic order is:
 ⋮

Options: -nN (sectors, default 79), -rN (N random restarts), -sN (econ seed), -tN (permutation seed), -g (greedy), -v/-V (verbose), -DDIR (data directory).

football

Finds long chains of college football scores to "prove" one team outranks another. With no argument a simple greedy algorithm is used; football N keeps the N best partial chains per stratum of a stratified heuristic and tends to find much longer chains — Stanford over Harvard by 781 points with the greedy algorithm, 1895 with width 10, 2185 with width 4000.

$ go run ./demos/football/ [searchwidth]
Starting team: Stanford
   Other team: Harvard
 Sep 22: Stanford Cardinal 37, Oregon State Beavers 3 (+34)
 Oct 13: Oregon State Beavers 35, Arizona Wildcats 21 (+48)
 ⋮
 Nov 17: Yale Bulldogs 34, Harvard Crimson 19 (+781)

girth

Computes the girth (shortest cycle) and diameter of the Ramanujan graphs built by gbraman. It prompts for two distinct primes p and q, prints the theoretical bounds the theory predicts, then finds the exact values by breadth-first search. For example, raman(2,43) has 79464 vertices, girth 20, and diameter 22.

$ go run ./demos/girth/
Choose a branching factor, p: 2
OK, now choose the cube root of graph size, q: 43
The graph has 79464 vertices, each of degree 3, and it is bipartite.
Any such graph must have diameter >= 15 and girth <= 30;
theoretical considerations tell us that this one's diameter is <= 34.
Starting at any given vertex, there are
       3 vertices at distance 1,
       ⋮
       0 vertices at distance 23.
So the diameter is 22, and the girth is 20.

Enter p = 2 only with q = 17 or 43; an empty line or EOF exits.

ladders

Finds shortest word ladders between five-letter English words using Dijkstra's algorithm.

$ go run ./demos/ladders/
Starting word: chaos
    Goal word: order
         0 chaos
         1 choos
         2 chops
         ⋮
        11 odder
        12 order

Options: -v (trace the search), -a (alphabetic distance), -f (frequency-based distance), -h (A*-style lower-bound heuristic), -e (echo input), -nN (N most common words), -rN (N random words), -sN (random seed), -dDIR (data directory).

miles_span

Finds the minimum spanning tree of a highway-mileage graph with four classic algorithms and reports how many "mems" (memory references) each one consumes — a machine-independent measure of efficiency. The four are Kruskal's algorithm (radix sort + union/find), Jarník–Prim with a binary heap, Jarník–Prim with a Fibonacci heap, and Cheriton–Tarjan–Karp with binomial queues. The mem counts reproduce Knuth's published figures exactly.

$ go run ./demos/miles_span/
The graph miles(100,0,0,0,0,10,0) has 405 edges,
  and its minimum spanning tree has length 14467.
 The Kruskal/radix-sort algorithm takes 8379 mems;
 the Jarnik/Prim/binary-heap algorithm takes 7972 mems;
 the Jarnik/Prim/Fibonacci-heap algorithm takes 11736 mems;
 the Cheriton/Tarjan/Karp algorithm takes 17770 mems.

Options: -nN (N cities, max 128), -NN/-WN/-PN (north/west/population weights), -dN (max degree), -rN (investigate N graphs with consecutive seeds), -sN (random seed), -v (report each tree edge), -gFILE (restore an external graph), -DDIR (data directory).

multiply

Multiplies numbers the slow way — by simulating, one gate at a time, the multiplication circuit that gbgates builds with prod. Given multiply m n it prompts for two numbers and multiplies them; given a third seed argument it specializes the circuit to multiply any m-bit number by one fixed random n-bit constant. An empty line ends the session.

$ go run ./demos/multiply/ 16 16
Here I am, ready to multiply 16-bit numbers by 16-bit numbers.
(I'm simulating a logic circuit with 1781 gates, depth 25.)

Number, please? 65535
Another? 65535
65535x65535=4294836225.

queen

Builds the graph of queen moves on a 3×4 board — the union of rook moves (board with -1) and bishop moves (board with -2) via gunion — then prints every vertex and arc and saves the result with gbsave. This is the Go port of the example program in the Stanford GraphBase introduction (queen.w and queen_wrap.w in sources/). With -w the second coordinate wraps around, giving queen moves on a cylinder, and the graph is saved as queen_wrap.gb instead of queen.gb.

$ go run ./demos/queen/
Queen Moves on a 3x4 Board

  The graph whose official name is
gunion(board(3,4,0,0,-1,0,0),board(3,4,0,0,-2,0,0),0,0)
  has 12 vertices and 92 arcs:

0.0
 -> 1.1, length 1
 -> 2.2, length 2
 ⋮

roget_components

Computes strongly connected components of the Roget thesaurus graph using Tarjan's iterative depth-first-search algorithm. Components are printed in reverse topological order.

go run ./demos/roget_components/ [-nN] [-dN] [-pN] [-sN] [-gFILE] [-DDIR]

take_risc

Multiplies and divides small numbers by simulating the simple RISC machine that gbgates builds with risc — loading a tiny 34-instruction program into its read-only memory and running the gate network cycle by cycle. It prompts for two positive numbers (each at most 0x7fff) and reports their product, quotient, and remainder. Any command-line argument turns on a per-cycle register trace.

$ go run ./demos/take_risc/
Welcome to the world of microRISC.

Gimme a number: 100
OK, now gimme another: 7
The product of 100 and 7 is 700.
The quotient is 14, and the remainder is 2.

word_components

Computes connected components of the five-letter word graph, printing statistics as each vertex is added (union-find algorithm).

go run ./demos/word_components/ [-dDIR]

Test Sample

Knuth's installation/portability check: it runs most of the generators and prints salient characteristics of each graph (and writes a saved graph to test.gb), to be diffed against a golden file. Run it from the repository root so the generators find their .dat files.

go run ./demos/test_sample/

Its output matches sources/sample.correct except for a few documented spots (raman's bool reduce parameter, the omitted stray-vertex pointer hack, and one library divergence it still surfaces in partial_gates).

Data Files

The data/ directory contains the .dat files required by the graph generators:

  • words.dat — five-letter English words with frequency data
  • roget.dat — Roget's Thesaurus cross-references
  • anna.dat, david.dat, jean.dat, huck.dat, homer.dat — literary character data
  • games.dat — 1990 college football scores
  • miles.dat — North American city mileage data
  • econ.dat — U.S. economic input/output data
  • lisa.dat — Mona Lisa pixel data

Generators look for these files in the current directory first, then under gbio.DataDirectory. Every file carries line counts and checksums that are verified on read.

Porting Notes

The port keeps Knuth's algorithms and data layouts intact while replacing C idioms with Go ones:

  • Errors instead of panic codes. The C library reports failures through a global panic_code; here every generator returns (*graph.Graph, error) with sentinel errors (graph.ErrBadSpecs, graph.ErrSyntaxError, …) that work with errors.Is.
  • Auxiliary data is returned, not stored in globals. words.Words returns a *words.Index for word lookup, miles.Miles returns a *miles.DistanceMatrix, books.Book returns the chapter-name list, and gates.RunRisc returns the machine state. Calls are independent and reentrant.
  • Boolean parameters are bool. C's long flags (directed, self, copy, …) become bool; integers remain only for bitmasks, enums, and genuinely ternary values. Graph ID strings still render flags as 0/1, so IDs match the C originals exactly.
  • Utility fields. The C util unions become any-typed fields (Vertex.UZ, Arc.A/B, Graph.UUZZ); each package provides typed accessor functions, and the 14-character UtilTypes convention is preserved for save/restore.
  • Memory management is the garbage collector's job. gb_recycle and friends are no-ops; there is no Area bookkeeping.
  • Diagnostic output goes to an io.Writer. Functions like dijk.Dijkstra (trace), dijk.PrintDijkstraResult, and gates.PrintGates take a writer instead of printing to stdout.
  • GB_IO is package gbio to avoid clashing with the standard library's io package.

The original CWEB sources (© 1993 Stanford University) are included in the sources/ directory for reference and comparison.

Testing

go test ./...

The tests exercise every module against invariants documented in the CWEB sources: the word graph has exactly 5757 vertices, Anna Karenina has 239 chapters, the mileage data satisfies the triangle inequality, the football demo reproduces Knuth's published chain totals, and save/restore round-trips preserve graphs exactly.

Requirements

  • Go 1.26 or later

Installation

git clone https://github.com/sjnam/go-sgb
cd go-sgb
go build ./...

Reference

About

A Go port of Donald Knuth's Stanford GraphBase

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages