chelae

A fast, accurate, multi-threaded toolkit for trimming and filtering short-read FASTQ data, written in Rust.

The name chelae is the plural of chela — the pincer-like claws of crustaceans — a nod to what this tool does to FASTQ reads.

Visit us at Fulcrum Genomics to learn more about how we can power your Bioinformatics with chelae and beyond.

This README is user-facing documentation. Contributors working on chelae itself should see CONTRIBUTING.md for build conventions, the pre-push checks, and the release process.

Overview

chelae exposes a single subcommand, chelae trim, which performs common short-read preprocessing tasks in one pass, in the following order:

Poly-G 3' trim (on by default)
Adapter trimming — PE-overlap evidence mode, plus --kit, --adapter-sequence, and --adapter-fasta for SE or deep trimming
Read-structure based hard-trim and UMI extraction (runs after adapter trim so tail-skip segments operate on the cleaned template)
Optional poly-X 3' trim (--trim-polyx)
Optional 5'→3' and/or 3'→5' sliding-window quality trim
Length filter post-trimming (--filter-length MIN[:MAX])
Optional N-base filter, mean-quality filter, and low-quality-fraction filter

Outputs are BGZF-compressed FASTQ plus a fastp-compatible JSON report suitable for MultiQC.

chelae uses paired-read overlap detection combined with adapter-sequence confirmation to rapidly and confidently identify adapter sequence in paired-end reads. This repository includes a benchmark suite in benchmark-pipeline/; chelae is the fastest tool tested across all experimental setups, while also providing the highest accuracy trimming. See the Performance section below.

Examples

PE-overlap adapter detection is on by default; for best accuracy it is also recommended to supply your adapter sequences via --kit or --adapter-sequence.

Trim paired-end reads with Illumina's truseq adapters

chelae trim \
    -i sample.r1.fq.gz sample.r2.fq.gz \
    -o trimmed.r1.fq.gz trimmed.r2.fq.gz \
    --kit truseq

Add in a 3' quality trim with an 8bp sliding window at Q20

chelae trim \
    -i sample.r1.fq.gz sample.r2.fq.gz \
    -o trimmed.r1.fq.gz trimmed.r2.fq.gz \
    --kit truseq \
    --quality-trim-3p 8:20

Trim paired-end reads while extracting an 8bp UMI and dropping 4 fixed bases in R1

chelae trim \
    -i sample.r1.fq.gz sample.r2.fq.gz \
    -o trimmed.r1.fq.gz trimmed.r2.fq.gz \
    --kit truseq \
    --read-structures 8M4S+T +T

Options

The tables below summarize every option accepted by chelae trim. For longer explanations (rationale, units, edge cases) run chelae trim --help.

Inputs, outputs, and runtime

Option	Description	Default
`-i, --inputs <PATHS>...`	One (SE) or two (PE) FASTQ files; plain, gzip, or bgzf (auto-detected)	—
`-o, --outputs <PATHS>...`	Output FASTQ path(s); count must match `--inputs`; always BGZF-compressed	—
`-t, --threads <N>`	Number of threads to use	`4`
`-c, --compression-level <1-12>`	BGZF compression level for output files	`5`
`-m, --metrics <PATH>`	Optional path for the trimming metrics TSV; stdout summary is always emitted	—
`-j, --json <PATH>`	Optional fastp-shape JSON report; consumed by MultiQC's `fastp` module unchanged	—

Read-structure (hard-trim + UMI extraction)

Option	Description	Default
`-r, --read-structures <RS>...`	Optional read-structures per input; supports `T` (template), `M` (UMI → read name), `S` (skip); applied after adapter trim	—
`--discard-unsupported-segments`	Treat `B` (sample barcode) and `C` (cellular barcode) segments as `S` (skip) instead of erroring	off

Adapter trimming

Option	Description	Default
`-k, --kit <NAME>...`	Built-in kit preset; repeatable. Known: `truseq`, `nextera`, `small-rna`, `aviti`, `mgi` (alias `dnbseq`), `all`	—
`-a, --adapter-sequence <SEQ>...`	3' adapter sequence(s); 1 for SE, 1 or 2 for PE (R1, R2); ACGT or IUPAC	—
`-f, --adapter-fasta <PATH>`	FASTA of adapter sequences; best match is trimmed	—
`--adapter-min-length <N>`	Minimum match length when searching the 3' end for an adapter sequence (SE mode; PE mode only for inserts < `overlap-min-length`)	`6`
`--adapter-mismatch-rate <0..1>`	Max fraction of mismatches when matching adapter against the 3' end (default ≈ 1 mismatch / 8 bases)	`0.125`

Paired-end overlap detection

Option	Description	Default
`--no-overlap-detection`	Disable PE-overlap trim-point detection (ignored for SE); rely on sequence matching alone	on (PE)
`--overlap-min-length <N>`	Minimum overlap (bp) required to declare R1/R2 overlap	`30`
`--overlap-max-mismatch-rate <0..1>`	Max fraction of mismatches in the overlap probe window	`0.10`
`--overlap-diagnostic-length <N>`	When evaluating PE overlap, only examine this many overlapping bases. Multiples of 16 ideal.	`64`
`--expected-insert-size <BP>`	Hint for typical insert size; seeds the overlap candidate-walk order so the right overlap is found sooner	—
`--insert-size-stats`	Emit a fastp-shape per-pair insert-size histogram under `insert_size` in the JSON (extends overlap probing to I > R configurations)	off

Poly-G / poly-X trimming

Option	Description	Default
`--trim-polyg <N>`	3' poly-G trim minimum run length; pass `0` to disable	`10`
`--trim-polyx [<N>]`	Enable 3' poly-X trim (A/C/T homopolymer tails, e.g. poly-A from RNA-seq) with the given minimum run length	off

Quality trimming

Both quality-trim modes shorten the read at the 3' end — the -3p / -5p suffix indicates the scan direction, not the trim location. -3p is conservative (keeps everything up to the last good window from the 3' end); -5p is aggressive (cuts at the first bad window encountered from the 5' end).

Option	Description	Default
`--quality-trim-3p [<W:Q>]`	Scan 3'→5'; trim trailing bases until a window of size `W` has mean quality ≥ `Q` (fastp `--cut_tail`)	off (`8:20`)
`--quality-trim-5p [<W:Q>]`	Scan 5'→3'; truncate at the first window of size `W` with mean quality < `Q` (fastp `--cut_right`)	off (`8:20`)

Filters (applied after trimming; pair dropped if either mate fails)

Option	Description	Default
`-l, --filter-length <MIN[:MAX]>`	Drop reads/pairs with post-trim length below `MIN` (or above `MAX`)	`15`
`--filter-max-ns <N>`	Drop reads/pairs whose per-mate count of ambiguous (N) bases exceeds `N`	off
`--filter-mean-qual <Q>`	Drop reads/pairs whose post-trim mean Phred quality is below `Q` (runs last, after every trim stage)	off
`--filter-low-qual <Q:F>`	Drop reads/pairs where the fraction of bases below quality `Q` exceeds `F` (e.g. `15:0.4`)	off

Performance

A Snakemake pipeline in benchmark-pipeline/ runs chelae against eight other FASTQ trimmers on simulated short-read libraries spanning cfDNA, WGS at several insert sizes, exome, miRNA, and a high-error condition — scoring both runtime and adapter-trim accuracy against the simulator's ground-truth boundaries.

chelae is the fastest tool tested on every dataset, and posts the most accurate adapter trimming on the majority of them. The two tables below show, per dataset, the top three tools by wall-clock time (wgs config: adapter + quality + length filter) and the top three tools by adapter-trim exact-match rate (adapter_only config). Every dataset is ~2× human WGS worth of simulated reads (read counts vary with read length — roughly 21–84 M pairs/reads per dataset). All numbers are 8-thread median over 3 replicates on an EC2 c8id.2xlarge (Intel Xeon 6975P-C, Granite Rapids).

Tool name abbreviations used in the tables: ar = adapterremoval, tg = trim-galore, tg-rs = trim-galore-rs, tmatic = trimmomatic.

Runtime — top 3 fastest tools per dataset, wall seconds @ 8 threads

ID	Layout	Insert	Err rate	#1	#2	#3
1	2×150	150 ± 30	0.1–1%	chelae 37.4	cutadapt 78.2	tg-rs 78.2
2	2×150	250 ± 40	0.1–1%	chelae 41.9	tg-rs 48.4	cutadapt 68.0
3	2×150	350 ± 60	0.1–1%	chelae 42.2	tg-rs 48.6	cutadapt 65.8
4	2×150	450 ± 80	0.1–1%	chelae 42.0	tg-rs 49.2	cutadapt 65.7
5	2×250	450 ± 80	0.1–1%	chelae 37.9	tg-rs 44.3	cutadapt 57.5
6	2×150	250 ± 60	1–5%	chelae 20.3	tg-rs 32.9	fastp-nfcore 39.4
7	2×150	170 ± 30	0.1–1%	chelae 39.9	tg-rs 60.3	cutadapt 71.2
8	2×76	140 ± 25	0.1–1%	chelae 50.5	tg-rs 61.5	cutadapt 84.8
9	1×150	300 ± 80	0.1–1%	chelae 42.8	cutadapt 68.8	tg-rs 72.3
10	1×150	120 ± 30	0.1–1%	chelae 39.5	bbduk 67.9	fastp 77.0
11	1×76	30 ± 2	0.1–1%	chelae 38.1	bbduk 43.3	fastp 61.2

chelae is #1 on every dataset, with the runner-up trailing by 15–95 %. The lead is widest on short-insert PE libraries where more trimming occurs and on the high error rate dataset.

Accuracy — top 3 by adapter-trim RMSE per dataset (lower is better)

We score adapter-trim accuracy as RMSE of the difference in trim point (in bases) vs. the simulator's ground truth, aggregated across reads. RMSE is used so as to penalize large under- and over-trim errors more than small ones under the premise that each additional base lost (or adapter base retained) in a read is more consequential than the last.

ID	Layout	Insert	Err rate	#1	#2	#3
1	2×150	150 ± 30	0.1–1%	chelae 0.045	ar 0.079	fastp 0.169
2	2×150	250 ± 40	0.1–1%	chelae 0.013	cutadapt 0.189	tg 0.189
3	2×150	350 ± 60	0.1–1%	chelae 0.009	cutadapt 0.098	tg 0.098
4	2×150	450 ± 80	0.1–1%	chelae 0.010	cutadapt 0.090	tg 0.090
5	2×250	450 ± 80	0.1–1%	chelae 0.011	cutadapt 0.166	tg 0.166
6	2×150	250 ± 60	1–5%	chelae 0.393	ar 0.482	cutadapt 1.310
7	2×150	170 ± 30	0.1–1%	chelae 0.040	ar 0.139	fastp 0.207
8	2×76	140 ± 25	0.1–1%	chelae 0.016	cutadapt 0.284	ar 0.312
9	1×150	300 ± 80	0.1–1%	chelae 0.267	cutadapt 0.272	fastp 0.284
10	1×150	120 ± 30	0.1–1%	fastp 0.652	chelae 0.803	cutadapt 0.964
11	1×76	30 ± 2	0.1–1%	tmatic 0.024	chelae 0.062	fastp 0.062

chelae wins on every PE dataset and one SE dataset, and comes second on the other two SE cases. On SE datasets, most tools perform similarly where it is possible to configure key parameters the same (i.e. min match length and maximum allowed error rate).

Tool versions

All tools were at the latest version available on bioconda as of the benchmarking date (2026-05-06), with one deliberate exception: fastp-nfcore runs a second fastp environment pinned to the version nf-core/modules currently ships, so MultiQC-via-nf-core users can see what they would actually get downstream.

Tool	Version tested
chelae	0.1.0
fastp	1.3.2
fastp-nfcore	1.1.0 (nf-core/modules pin)
cutadapt	5.2
trim-galore	0.6.11
trim-galore-rs	2.1.0 (Oxidized Edition)
trimmomatic	0.40
bbduk (bbmap)	39.81
adapterremoval	2.3.4

More detail, raw data, and reproduction

For per-dataset tables with every tool (not just the top 3) and additional metrics — RSS, user CPU, parallel efficiency, MAE, false-positive and false-negative counts, etc. — see benchmark-pipeline/RESULTS.md.

The full per-row source data (one row per (sample, trim_config, tool, threads, replicate)) lives in benchmark-pipeline/.benchmark-outputs/perf-20260508/results/{bench_summary,accuracy_summary}.tsv. See benchmark-pipeline/README.md for the reproduction recipe.

Installing

Install from bioconda

Using pixi, after adding the bioconda channel:

pixi add chelae

Or using your favorite conda client (conda, mamba, micromamba, …):

conda install -c bioconda chelae

Installing with `cargo`

To install with cargo you must first install rust. Which (on macOS and Linux) can be done with:

curl https://sh.rustup.rs -sSf | sh

Then, to install chelae run:

cargo install chelae

Building From Source

First, clone the git repo:

git clone https://github.com/fulcrumgenomics/chelae.git

If you do not already have rust development tools installed, install via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then build in release mode:

cd chelae
cargo build --release
./target/release/chelae --help

Build Targeting and Portability

x86_64 release binaries ship as a single cargo multivers launcher that embeds three CPU-specific builds and picks the best match at startup:

x86-64 — SSE2 baseline, runs on any 64-bit x86 CPU (2003+)
x86-64-v2 — SSE4.2 + POPCNT (2008+); captures nearly all of the historical "v3 wins 6%" codegen benefit
x86-64-v4 — AVX-512F/BW/CD/DQ/VL for Ice Lake / Sapphire Rapids / Granite Rapids / Zen 4+

The launcher is ~3.7 MB total and adds ~0.2 s of startup for decompression + memfd_create + exec. v3 is intentionally skipped — on chelae's workload v2 and v3 are within measurement noise, and v4 picks up what little additional win AVX-512 gives (~1% on our benchmarks).

aarch64 release binaries (Apple Silicon, AWS Graviton, GCP Axion, Azure Cobalt) are a single build with generic ARMv8-A / NEON baseline. Benchmarks showed Neoverse-specific tuning yields only ~1-2% over generic and cross-tuning penalty is near zero, so multivers isn't worth the complexity on aarch64.

For local development, cargo build --release uses target-cpu=native (see .cargo/config.toml) for fastest local runs.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.cargo		.cargo
.github		.github
benchmark-pipeline		benchmark-pipeline
ci		ci
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
release.toml		release.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chelae

Contents

Overview

Examples

Trim paired-end reads with Illumina's truseq adapters

Add in a 3' quality trim with an 8bp sliding window at Q20

Trim paired-end reads while extracting an 8bp UMI and dropping 4 fixed bases in R1

Options

Inputs, outputs, and runtime

Read-structure (hard-trim + UMI extraction)

Adapter trimming

Paired-end overlap detection

Poly-G / poly-X trimming

Quality trimming

Filters (applied after trimming; pair dropped if either mate fails)

Performance

Runtime — top 3 fastest tools per dataset, wall seconds @ 8 threads

Accuracy — top 3 by adapter-trim RMSE per dataset (lower is better)

Tool versions

More detail, raw data, and reproduction

Installing

Install from bioconda

Installing with `cargo`

Building From Source

Build Targeting and Portability

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chelae

Contents

Overview

Examples

Trim paired-end reads with Illumina's truseq adapters

Add in a 3' quality trim with an 8bp sliding window at Q20

Trim paired-end reads while extracting an 8bp UMI and dropping 4 fixed bases in R1

Options

Inputs, outputs, and runtime

Read-structure (hard-trim + UMI extraction)

Adapter trimming

Paired-end overlap detection

Poly-G / poly-X trimming

Quality trimming

Filters (applied after trimming; pair dropped if either mate fails)

Performance

Runtime — top 3 fastest tools per dataset, wall seconds @ 8 threads

Accuracy — top 3 by adapter-trim RMSE per dataset (lower is better)

Tool versions

More detail, raw data, and reproduction

Installing

Install from bioconda

Installing with cargo

Building From Source

Build Targeting and Portability

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

Installing with `cargo`