Ultimate Guitar Archive Scraper

Scrapes tabs from ultimate-guitar.com using Selenium WebDriver to handle JavaScript-rendered content.

Note - I've included a fresh scrape zip from 20250725, so you can skip the scraping and get to the downloading if you like. Extract it into the repo root and you're off. Pass it as-is with --local-files-dir for local use, and bind mount it to the Docker container for that use case.

Script Arguments

Argument	Default	Description
`--base-url`	`https://www.ultimate-guitar.com`	Base URL of the site to scrape
`--user-agent`	`Mozilla/5.0 (Linux; Android 13; Pixel 7)...`	User agent string for requests
`--starting-letter`	`0-9`	Starting position for band list
`--end-letter`	`z`	Ending position for band list
`--outdir`	`./tabs`	Output directory for scraped data
`--max-tabs-per-band`	`None`	Maximum tabs to download per band
`--max-bands`	`None`	Maximum bands to process
`--tab-types`	`None`	Filter by tab type (CRD, TAB, PRO, BASS)
`--include-metadata`	`False`	Include metadata header in tab files
`--max-retry-attempts`	`3`	Maximum retry attempts for failed requests
`--retry-base-delay`	`1.0`	Base delay for exponential backoff (seconds)
`--retry-max-delay`	`30.0`	Maximum delay for exponential backoff (seconds)
`--disable-retry-jitter`	`False`	Disable random jitter in retry delays
`--scrape-only`	`False`	Only scrape metadata, skip downloading content
`--download-only`	`False`	Only download using existing metadata files
`--input-files-dir`	Value of `--outdir`	Directory with preexisting artist JSON files (for read-only use with `--download-only`)
`--skip-existing-bands`	`False`	Skip bands that already have JSON files
`--threads`	`1`	Number of parallel threads for download-only mode
`--skip-existing-tabs`	`True`	Skip downloading tabs if file already exists
`--overwrite-existing-tabs`	N/A	Overwrite existing tab files (opposite of --skip-existing-tabs)

Environment Setup

Virtual Environment

Linux - Script

./setup-env.sh

Linux - Manual

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Install Chrome/Chromium and ChromeDriver for your system, and all the prereqs listed in `setup-env.sh`

Windows

==I'm pretty sure these steps are right, good luck== You'll need to install Chrome/Chromium + ChromeDriver and ensure they are in your PATH. Highly recommend the use of a venv, so still:

python3 -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

Docker Container

# Build with your user ID for proper file permissions
docker build --build-arg USER_UID=$(id -u) -t riggi/ug-archive .
# Or pull from Docker Hub, uses UID 1000 and I can't be bothered to apply any further effort
docker pull riggi/ug-archive

Examples

Virtual Environment

# Full scrape + download mode, no limits
python main.py

# Scrape-only mode (metadata collection), limited to a maximum of 10 bands
python main.py --scrape-only --max-bands 10 --outdir ./metadata

# Download-only mode (using existing metadata)
python main.py --download-only --local-files-dir ./metadata --outdir ./tabs

# Download-only mode with parallel processing (4 threads)
python main.py --download-only --local-files-dir ./metadata --outdir ./tabs --threads 4

# Tab type filtering (chord charts only)
python main.py --tab-types CRD --max-bands 5 --outdir ./chords

# Letter range setting (single letter), only 20 tabs per band, scraping + downloading
python main.py --starting-letter m --end-letter m --max-tabs-per-band 20

# Include metadata headers in downloaded tab files
python main.py --include-metadata --outdir ./tabs_with_metadata

Docker Container

# Full scrape + download mode, no limits
docker run --rm -v $(pwd)/output:/app/output riggi/ug-archive python main.py --outdir /app/output

# Scrape-only mode (metadata collection), limited to a maximum of 10 bands
docker run --rm -v $(pwd)/metadata:/app/output riggi/ug-archive python main.py --scrape-only --max-bands 10 --outdir /app/output

# Download-only mode (using existing metadata)
docker run --rm -v $(pwd)/data:/app/output riggi/ug-archive python main.py --download-only --local-files-dir /app/output --outdir /app/output

# Download-only mode with parallel processing (4 threads)
docker run --rm -v $(pwd)/data:/app/output riggi/ug-archive python main.py --download-only --local-files-dir /app/output --outdir /app/output --threads 4
# Tab type filtering (chord charts only)
docker run --rm -v $(pwd)/chords:/app/output riggi/ug-archive python main.py --tab-types CRD --max-bands 5 --outdir /app/output

# Letter range setting (single letter), only 20 tabs per band, scraping + downloading
docker run --rm -v $(pwd)/output:/app/output riggi/ug-archive python main.py --starting-letter m --end-letter m --max-tabs-per-band 20 --outdir /app/output

# Include metadata headers in downloaded tab files
docker run --rm -v $(pwd)/tabs_with_metadata:/app/output riggi/ug-archive python main.py --include-metadata --max-bands 3 --outdir /app/output

Features

Progress Monitoring

In download-only mode, the scraper displays real-time progress showing the current tab being processed out of the total number of tabs:

Progress: 1523/5000 tabs processed

Tab File Handling

--skip-existing-tabs (default): Skip downloading if file already exists
--overwrite-existing-tabs: Overwrite existing files, opposite of skip behavior

Output Structure

bands_summary.json - Overall scraping summary
band_{id}.json - Individual band metadata
{Band_Name}_{id}/ - Directory containing tab files
download_summary.json - Download-only mode summary

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
fix_pwr_extensions.py		fix_pwr_extensions.py
main.py		main.py
requirements.txt		requirements.txt
run-docker.bat		run-docker.bat
run-docker.sh		run-docker.sh
setup-env.sh		setup-env.sh
tabs.zip		tabs.zip
test-docker.bat		test-docker.bat
test-docker.sh		test-docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ultimate Guitar Archive Scraper

Script Arguments

Environment Setup

Virtual Environment

Linux - Script

Linux - Manual

Windows

Docker Container

Examples

Virtual Environment

Docker Container

Features

Progress Monitoring

Tab File Handling

Output Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ultimate Guitar Archive Scraper

Script Arguments

Environment Setup

Virtual Environment

Linux - Script

Linux - Manual

Windows

Docker Container

Examples

Virtual Environment

Docker Container

Features

Progress Monitoring

Tab File Handling

Output Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages