A fast, parallel, incremental static site generator powered by XML + XSLT. Written in Python, with optional SaxonC acceleration for high-performance builds.
- XML-based article format
- XSLT-driven rendering pipeline (per-section engine choice)
- Streaming, parallel build execution
- Incremental caching (only rebuilds changed files)
- Multi‑language sites with per‑article
<lang> - Automatic generation of:
- Article pages
- Paginated index pages
- Tag pages
- RSS feed
- Sitemap
- Search index (JSON)
- Dry‑run mode
- Section‑limited builds (
--only), language and slug filters - Optional SaxonC support (auto‑falls back to lxml)
A site that uses Articulous is laid out like this:
your-site/
├── sitegen.py # executable shim (ships with this repo)
├── articulous/ # the Python package (ships with this repo)
├── config.json # site configuration
├── content/ # your .xml articles (any sub‑tree layout)
│ ├── en/
│ │ └── hello-world.xml
│ └── fr/
│ └── bonjour.xml
├── xsl/
│ └── pages/
│ ├── article.xsl
│ ├── index.xsl
│ ├── tags.xsl
│ ├── rss.xsl
│ ├── sitemap.xsl
│ └── search.xsl
├── assets/ # images, css, js, mirrored into output/
└── output/ # build target (created on first run)
├── en/
│ ├── articles/<slug>/index.html
│ ├── page/<n>/index.html
│ └── tags/<tag>/index.html
├── assets/
├── rss.xml
├── sitemap.xml
└── search.json
The build cache (.build-cache.json by default) is written next to
sitegen.py and is safe to delete to force a full rebuild. Concurrent
builds are guarded by .build-cache.lock; stale or malformed locks are
recovered automatically.
Place your articles anywhere inside content/. Each file:
<?xml version="1.0" encoding="UTF-8"?>
<article>
<title>Hello World</title>
<date>2024-01-01</date>
<description>My first article.</description>
<lang>en</lang> <!-- optional; falls back to site.default_language -->
<status>published</status> <!-- optional; default is published -->
<publish-at>2024-01-01T00:00:00</publish-at> <!-- only used by scheduled -->
<tag>intro</tag>
<tag>welcome</tag>
</article>Notes:
- The slug is derived from
<title>(URL‑safe). If the title is empty, the filename stem is used instead. <date>must be ISO‑8601 (YYYY-MM-DDor full timestamp). Articles with a missing or invalid date are skipped with a warning.<lang>is validated againstsite.languages; values outside the configured set fall back tosite.default_language.<status>may bepublished,unlisted,draft,private, orscheduled. Unknown statuses are skipped with a warning.unlistedpages are generated but excluded from index/tag pages, RSS, sitemap, and search.draftandprivatearticles are skipped.scheduledarticles publish only when<publish-at>is present and its ISO-8601 timestamp has passed. A missing timestamp keeps the article skipped; an invalid timestamp is skipped with a warning.
The generator requires these files inside xsl/pages/:
article.xslindex.xsltags.xslrss.xslsitemap.xslsearch.xsl
Either invoke the shim:
chmod +x sitegen.py
./sitegen.py…or run the package directly:
python3 -m articulousBoth forms are equivalent.
--config FILE Use a custom config file (default: config.json)
--only SECTIONS Build only specific sections, comma-separated
(articles,index,tags,rss,sitemap,search).
Unknown names raise immediately.
--lang LANG Limit build to specific languages (repeat to pass many)
--slug SLUG Build only specific article slugs (repeat to pass many)
--dry-run Do not write output files
--force Ignore cache and rebuild everything
-v, --verbose Enable DEBUG-level logging
Examples:
./sitegen.py --only articles
./sitegen.py --only index,tags --verbose
./sitegen.py --lang en --lang fr --slug hello-world
./sitegen.py --dry-runConfiguration lives in config.json. If the file is missing, defaults
are written automatically on first run. Malformed existing configuration
fails fast instead of being overwritten.
| Block | Keys |
|---|---|
paths |
content_dir, output_dir, xsl_dir, assets_dir |
engines |
per‑section engine (articles, index, tags, rss, sitemap, search) → "lxml" or "saxonc" |
saxonc |
home — path to the SaxonC install (optional; Transform is discovered on PATH when null) |
site |
site_name, base_url, languages, default_language, pagination.per_page, pagination.rss_items |
urls |
URL patterns for article, index, tag, rss, sitemap, search (e.g. /{lang}/articles/{slug}/) |
build |
cache_file, parallelism ("auto" or an int), force_full_build |
logging |
level (DEBUG/INFO/WARNING/ERROR/CRITICAL), file (optional path) |
An unknown engine name in engines aborts the build immediately so
typos surface early. Requesting "saxonc" without a usable binary
silently falls back to the lxml engine. lxml denies XSLT file/network
access, and SaxonC runs with external functions and XInclude disabled.
URL patterns are validated by section:
| URL key | Required placeholders | Output shape |
|---|---|---|
article |
{lang}, {slug} |
Directory-style; index.html is appended |
index |
{lang}, {page} |
Directory-style; index.html is appended |
tag |
{lang}, {tag} |
Directory-style; index.html is appended |
rss |
none | File-style path, such as /rss.xml |
sitemap |
none | File-style path, such as /sitemap.xml |
search |
none | File-style path, such as /search.json |
Directory-style patterns must not end in a file suffix. File-style patterns must include one.
Files are written to output/ according to the configured urls
patterns. Defaults produce:
output/en/articles/<slug>/index.htmloutput/en/page/<n>/index.htmloutput/en/tags/<tag>/index.htmloutput/assets/...(incremental mirror ofassets/)output/rss.xmloutput/sitemap.xmloutput/search.json
The package is covered by a pytest suite with fixture‑based site builds,
unit tests, and section‑order fuzz tests. Coverage is enforced at
100 % line + branch via pytest-cov (--cov-fail-under=100).
Install xmllint first (for example, libxml2-utils on Debian/Ubuntu),
then install the Python test dependencies and local lint/type tools.
pip install -e ".[test]" ruff pyright
pytest
ruff check .
pyrightNotes:
- Incremental builds use
.build-cache.jsonat the project root. parallelismdefaults to CPU count when set to"auto".- SaxonC is optional; when its binary is absent the lxml engine is used
for every section, regardless of
enginesconfiguration.
MIT