A daily bird species RSS feed and minimal static site, self-hostable as a microservice. Each day a new species is selected from a configurable weighted pool of regions, scraped from public Cornell Lab sources, and published to a GitHub Pages site plus an RSS endpoint.
The example configuration ships with US-weighted pools. Copy
data/config.example.json to data/config.json and adjust the regions,
language and weights to your taste. English, Spanish, French and
Portuguese catalogs are included. Adding another language is one JSON
file under data/i18n/.
An optional LLM enrichment mode can generate warm, narrative prose and field-ID tips from the scraped sources (any OpenAI-compatible endpoint). The project runs fine without it. No tracking, no cookies.
GitHub Pages serves three static routes from the repository root:
| Route | What it is |
|---|---|
/ (index.html) |
Hero of the day's bird + a grid of the most recent twelve |
/archive.html |
Every published bird in reverse chronological order, with permanent anchors (#bird-{code}-{date}) |
/feed.xml |
RSS 2.0 feed with rich content:encoded HTML |
Everything is server-rendered HTML — no JavaScript framework, no build
step. The single small piece of vanilla JS is an inline theme switcher
that persists light/dark preference in localStorage.
GitHub Actions (cron daily 07:00 UTC)
│
├─ 1. Pool weighted by date (e.g. 35% Madrid, 27% Spain,
│ 23% one random European country, 15% global taxonomy)
├─ 2. Species selection biased toward rarer observations,
│ deduplicated against the last 50 publications
├─ 3. Photo + photographer from Macaulay Library Search API
│ (with fallback to og:image on the eBird species page)
├─ 4. Description chain in the configured language:
│ eBird Merlin → Wikipedia → policy-driven fallback
├─ 4b. (optional) LLM enrichment: sends scraped data to an
│ OpenAI-compatible endpoint, receives narrative prose + ID tips
├─ 5. Wikipedia URL captured (target language → English fallback)
│ so the footer link is always present
├─ 6. GBIF distribution map composed (basemap + density overlay)
├─ 7. feed.xml + index.html + archive.html written
└─ 8. git commit + git push → GitHub Pages republishes
The selection is deterministic by date: two runs on the same day pick
exactly the same species. The script bails early if today's entry is
already in history.json, so the daily cron and ad-hoc reruns don't
duplicate work.
- Python 3.12+, managed with
uv - Four runtime dependencies:
requests,beautifulsoup4,langid,Pillow - No database. State lives in three files in the repo:
feed.xml,history.json,cache/
git clone https://github.com/backmind/Bird-of-the-day.git
cd Bird-of-the-day
uv sync
cp data/config.example.json data/config.json
# edit data/config.json — set your language, pools, feed_link| Variable | Required | Where to get it |
|---|---|---|
EBIRD_API_KEY |
yes | Free at https://ebird.org/api/keygen |
BOTD_LLM_API_KEY |
only for enriched mode |
Your LLM provider (e.g. Google AI Studio) |
For local use copy .env.example to .env and fill the key:
cp .env.example .env
# edit .env.env is gitignored and generate.py loads it automatically (no
python-dotenv required).
In GitHub Actions the key is injected from Settings → Secrets and variables → Actions → New repository secret with the same name.
Copy the bundled example and edit it:
cp data/config.example.json data/config.jsonEvery behavior knob lives here. Annotated example:
{
"language": "es",
"_language_help": "One of es | en | fr | pt (or any other catalog you add).",
"ebird_locale": null,
"_ebird_locale_help": "Optional override for the eBird API locale (e.g. 'pt_BR'). If null, derived from `language`.",
"description_policy": "foreign_fallback",
"_description_policy_help": "How to render descriptions when no source produces text in the configured language. One of: strict | foreign_fallback | skip.",
"max_skip_retries": 50,
"pools": [
{"id": "madrid", "region": "ES-MD", "weight": 0.35, "type": "regional"},
{"id": "spain", "region": "ES", "weight": 0.27, "type": "regional"},
{"id": "europe", "weight": 0.23, "type": "europe_random",
"countries": ["PT", "FR", "IT", "DE", "GB", "GR", "SE", "NO", "PL"]},
{"id": "global", "weight": 0.15, "type": "global_taxonomy"}
],
"dedup_window": 50,
"max_feed_entries": 60,
"back_days": 14,
"feed_link": "https://YOUR-USERNAME.github.io/Bird-of-the-day/",
"content_mode": "programmatic",
"llm": {
"endpoint": "https://generativelanguage.googleapis.com/v1beta/openai",
"model": "gemini-flash-latest",
"temperature": 0,
"max_retries": 2
}
}Keys starting with _ are documentation-only and ignored at load time.
What happens when none of the sources (eBird Merlin, target-language Wikipedia) returns text in your configured language:
| Policy | Behavior |
|---|---|
foreign_fallback (default) |
Show the original text with a disclaimer naming the source language (e.g. "Description in English (no French translation available)"). |
strict |
Show an em-dash placeholder. Never display foreign text. |
skip |
Re-roll species selection up to max_skip_retries times. On exhaustion, falls back to strict. |
Even with strict, the footer always carries a Wikipedia link — falling
back to English Wikipedia (and labeled Wikipedia (en)) if the target
language has no article.
The site.tagline and feed.description strings in data/i18n/*.json
are intentionally generic ("A new bird species every day."): the regional
flavor of the site is decided by pools in data/config.json, not baked
into the copy. If you want a region-specific tagline, edit the catalog of
the language you're shipping in.
uv run python -m scripts.generateThis:
- Loads
.envif present. - Loads the i18n catalog for the configured language.
- Bails early if today's entry is already in
history.json. - Selects the species, fetches image and content (writing to
cache/). - Writes
feed.xml,index.html,archive.html. - Updates
history.json.
To force a regeneration of today's entry, empty the history:
echo '{"entries": []}' > history.json
uv run python -m scripts.generateCopy .github/bird-of-the-day.yml.example to
.github/workflows/bird-of-the-day.yml to enable the daily cron. It runs:
- Automatically every day at 07:00 UTC.
- Manually from the Actions → Bird of the Day → Run workflow tab.
The workflow git adds feed.xml, history.json, index.html,
archive.html and cache/, then commits with a message of the form
🐦 Bird of the day: 2026-04-11 and pushes to the default branch.
Two paths, pick whichever fits your taste. They are peer options, not replacements: pick GitHub Pages if you want a free hosted site with zero ops, or Docker if you run your own server.
| Path | Best for | Cost | Ops |
|---|---|---|---|
| GitHub Actions + Pages | "I just want a free site." | Free | None |
| Docker container | "I run my own server / VPS / Pi / fly.io." | A host with Docker | Standard container ops |
The image is published to ghcr.io/backmind/bird-of-the-day for
linux/amd64 and linux/arm64. It runs nginx on port 8080 and a
built-in cron (supercronic) that regenerates the site daily at 07:00 UTC,
matching the GitHub Actions cadence. Total image size: ~340 MB.
docker run -d --name bird-of-the-day \
-p 8080:8080 \
-e EBIRD_API_KEY=YOUR_KEY \
-v botd-data:/var/lib/botd \
--restart unless-stopped \
ghcr.io/backmind/bird-of-the-day:latestOpen http://localhost:8080. On a fresh container the first request may take 30–60 seconds while the generator runs synchronously to populate the volume.
A ready-to-use docker-compose.yml lives at the repo root with sensible
defaults (named volume, healthcheck, cap_drop: ALL,
no-new-privileges). Set EBIRD_API_KEY in your shell or a sibling
.env file and run:
docker compose up -dScalar config knobs can be tweaked without rebuilding the image or
mounting a custom config file. Each maps to a key in data/config.json
and overrides it if set:
| Env var | Maps to | Example |
|---|---|---|
BOTD_LANGUAGE |
language |
en, fr, pt |
BOTD_EBIRD_LOCALE |
ebird_locale |
pt_BR |
BOTD_DESCRIPTION_POLICY |
description_policy |
strict, foreign_fallback, skip |
BOTD_MAX_SKIP_RETRIES |
max_skip_retries |
50 |
BOTD_DEDUP_WINDOW |
dedup_window |
50 |
BOTD_MAX_FEED_ENTRIES |
max_feed_entries |
60 |
BOTD_BACK_DAYS |
back_days |
14 |
BOTD_FEED_LINK |
feed_link |
https://example.com/birds/ |
BOTD_CONTENT_MODE |
content_mode |
programmatic, enriched |
EBIRD_API_KEY is required. BOTD_LLM_API_KEY is needed only when
content_mode is enriched. The container does not read .env
files (it doesn't need to — env vars work everywhere).
Standard Docker / k8s secrets convention: instead of passing the key as
an env var, mount a file containing the key and point at it with
EBIRD_API_KEY_FILE:
docker run ... \
--secret source=ebird_api_key,target=/run/secrets/ebird_api_key \
-e EBIRD_API_KEY_FILE=/run/secrets/ebird_api_key \
ghcr.io/backmind/bird-of-the-dayIn Kubernetes, mount a Secret as a volume and set
EBIRD_API_KEY_FILE to the mounted path. If both EBIRD_API_KEY and
EBIRD_API_KEY_FILE are set, the env var wins.
The pools matrix is a nested structure not exposed via env vars
(stringifying it would be painful). To customise it without forking
the repo and rebuilding:
# 1. Copy the example and edit it
cp data/config.example.json my-config.json
# 2. Mount it into the container:
docker run -d ... \
-v ./my-config.json:/app/data/config.json:ro \
ghcr.io/backmind/bird-of-the-dayThe mount is :ro (read-only) — the container only reads it.
The single volume at /var/lib/botd holds all mutable state:
/var/lib/botd/
├── cache/ # per-species + taxonomy caches
├── feed.xml # the RSS feed
├── index.html # the front page
├── archive.html # the chronological archive
└── history.json # the full publication history
Back this up and you can rebuild the running container without losing a single day. The default schedule writes to it once per day at 07:00 UTC.
The container's HEALTHCHECK verifies three things every 5 minutes:
feed.xmlexists on the volume.feed.xmlwas modified within the last 36 hours.- nginx is actually serving
/feed.xmlon port 8080.
If the daily cron silently stops working, the container goes
unhealthy within 36 hours — that's the intended behavior, and
your orchestrator (k8s / docker swarm / fly machines / etc.) will
surface it. The 36 h window gives the daily 07:00 UTC run a 12 h grace
period.
There's also a cheap liveness probe at /healthz that just returns
200 ok if nginx is up.
The container runs as a non-root user (botd, uid 1000) and needs no
Linux capabilities. Recommended hardening for security-conscious
deployments:
docker run -d \
--read-only \
--cap-drop=ALL \
--security-opt no-new-privileges \
--tmpfs /tmp \
--tmpfs /var/log/nginx \
--tmpfs /var/lib/nginx \
--tmpfs /run/nginx \
-p 8080:8080 \
-e EBIRD_API_KEY=$KEY \
-v botd-data:/var/lib/botd \
ghcr.io/backmind/bird-of-the-dayThe --read-only root filesystem requires writable tmpfs for
nginx's working directories. The container has been tested in this
mode end-to-end.
Resource hints: ~50–100 MB RAM at idle, ~150 MB during generation,
bursty CPU. A floor of mem_limit: 256m and cpus: 0.5 is
comfortable.
docker build -t bird-of-the-day .
# Multi-arch:
docker buildx build --platform linux/amd64,linux/arm64 -t bird-of-the-day .The container is UTC by default. The cron expression in
docker/crontab is 0 7 * * * (07:00 UTC, matching the GitHub
Actions workflow). To change it, edit that file and rebuild — or
mount your own at /etc/supercronic/crontab.
- Click Use this template on the repo page (or clone manually).
- Copy and edit the config:
Set
cp data/config.example.json data/config.json
feed_linkto yourhttps://<user>.github.io/<repo>/URL, pick alanguage, and adjustpoolsfor your regions. - Activate the daily workflow:
cp .github/bird-of-the-day.yml.example .github/workflows/bird-of-the-day.yml
- If you want a custom domain, copy
CNAME.exampletoCNAMEand write your domain in it. Configure your DNS to point to<user>.github.io. - Settings → Secrets and variables → Actions → add
EBIRD_API_KEY. Optionally addBOTD_LLM_API_KEYif using enriched mode. - Settings → Pages → Build and deployment → source:
Deploy from a branch, branch:main, folder:/ (root). Save. - Either wait for the daily cron or trigger Actions → Bird of the Day → Run workflow manually for the first publication.
Some alternative presets you can paste into data/config.json:
Western US flavor:
"pools": [
{"id": "california", "region": "US-CA", "weight": 0.35, "type": "regional"},
{"id": "us_west", "region": "US", "weight": 0.27, "type": "regional"},
{"id": "americas", "weight": 0.23, "type": "europe_random",
"countries": ["MX", "CA", "CR", "BR", "AR", "CO", "EC", "PE", "CL"]},
{"id": "global", "weight": 0.15, "type": "global_taxonomy"}
]Pan-European balance (no national bias):
"pools": [
{"id": "europe", "weight": 0.85, "type": "europe_random",
"countries": ["ES", "PT", "FR", "IT", "DE", "GB", "GR", "SE", "NO", "PL", "NL", "BE", "AT", "CH", "DK", "FI"]},
{"id": "global", "weight": 0.15, "type": "global_taxonomy"}
]The type field accepts regional (single region code) or
europe_random (list of countries, one picked per day) or
global_taxonomy (any species in the eBird taxonomy).
- Copy
data/i18n/en.jsontodata/i18n/{lang}.json(use the ISO 639-1 code, e.g.de,it,ca). - Translate every value. Missing keys fall back to English at render time, so partial translations are safe.
- Add the language name in your file, plus the names of every other
supported language as seen from your language. For example, in
de.json:"language_name.es": "Spanisch", "language_name.en": "Englisch", "language_name.fr": "Französisch", "language_name.pt": "Portugiesisch", "language_name.de": "Deutsch"
- Set
language: "{lang}"in yourdata/config.json. - Clear
cache/*.json(but keepcache/taxonomy.json) so existing per-species caches don't render in the previous language. - Run
uv run python -m scripts.generateand verify. - Open a PR.
langid (the language detector) supports 97 languages out of the box;
the only constraint on which target languages are valid is that there's a
data/i18n/{lang}.json file.
Bird-of-the-day/
├── .github/
│ ├── bird-of-the-day.yml.example # copy to workflows/ to enable daily cron
│ └── workflows/
│ └── docker-publish.yml # build & push multi-arch image to ghcr.io
├── Dockerfile # multi-stage container build
├── .dockerignore
├── docker-compose.yml # one-command self-host
├── docker/
│ ├── crontab # supercronic schedule (07:00 UTC)
│ ├── entrypoint.sh # cold-start + supercronic + exec nginx
│ ├── healthcheck.sh # smart freshness check (36h window)
│ ├── nginx.conf # non-root nginx, port 8080
│ └── placeholder.html # cold-start fallback page
├── scripts/
│ ├── generate.py # orchestrator (entry point)
│ ├── ebird_client.py # eBird API + species selection + taxonomy cache
│ ├── image_fetcher.py # Macaulay Library API + og:image fallback
│ ├── content_scraper.py # eBird og:description + Wikipedia + BoW
│ ├── llm_enricher.py # optional LLM content enrichment
│ ├── map_composer.py # server-side map composition for RSS
│ ├── name_linker.py # species name cross-linking
│ ├── feed_builder.py # RSS 2.0 generation
│ ├── site_builder.py # index.html + archive.html generation
│ ├── i18n.py # Catalog loader + langid wrapper
│ └── seed_mock.py # developer-only: populate the site for visual review
├── data/
│ ├── config.example.json # copy to config.json and customize
│ └── i18n/{es,en,fr,pt}.json # translation catalogs
├── cache/ # taxonomy + per-species caches (generated)
├── maps/ # composed distribution maps (generated)
├── CNAME.example # copy to CNAME for custom domain setup
├── .env.example # environment variable template
├── pyproject.toml # dependencies and uv metadata
├── uv.lock # lock file
├── LICENSE # MIT
└── README.md
- eBird API: non-commercial use is permitted under the eBird API Terms of Use. The project makes at most one selection call per day.
- Macaulay Library: photographs are © their authors. The project hot-links the public Cornell CDN for non-commercial display with visible photographer attribution, mirroring the embed flow Cornell itself offers.
- Merlin / eBird and Birds of the World text: © Cornell Lab of Ornithology. The feed reproduces short fragments with clear attribution and links back to the source, with no commercial purpose.
- Wikipedia: REST summary content is CC BY-SA 3.0; we link to the canonical article and don't redistribute beyond the short summary.
- Generated data (feed, site): MIT, free to reuse with attribution.
This site stores your theme preference (light/dark) in localStorage so
it persists between visits. That's the only client-side state, it never
leaves the browser, and it falls under the "strictly necessary functional
preferences" exemption of the EU ePrivacy Directive — no consent banner
or cookie notice is required. There are no cookies, no analytics, no
trackers, and no third-party requests beyond Google Fonts (for typography)
and the Macaulay Library CDN (for photos).
MIT. Third-party content (photos, Cornell text excerpts, Wikipedia summaries) keeps its respective licenses and attributions.