A high-performance HGVS variant nomenclature parser and normalizer written in Rust.
WARNING: ALPHA SOFTWARE - USE AT YOUR OWN RISK
This software is currently in ALPHA. While we have extensively tested it across a wide variety of HGVS patterns, no guarantees are made regarding correctness or stability.
- Full HGVS Parsing: All coordinate systems (g/c/n/r/p/m/o) and edit types
- Variant Normalization: 3'/5' shifting per HGVS specification
- High Performance: ~2.5M variants/sec parsing, zero-copy with nom
- Type-Safe: Leverages Rust's type system for correctness
pip install ferro-hgvsPre-built wheels are available for Linux (x86_64, aarch64), macOS (x86_64, Apple Silicon), and Windows (x86_64) on Python 3.10+.
Add to your Cargo.toml:
[dependencies]
ferro-hgvs = "0.1"Or install the CLI:
cargo install ferro-hgvs# Parse a variant
ferro parse "NM_000088.3:c.459A>G"
# Parse from file
ferro parse -i variants.txt -f json
# Prepare reference data (downloads RefSeq, genome, cdot)
ferro prepare --output-dir ferro-reference
# Verify reference data is ready
ferro check --reference ferro-reference
# Normalize with reference
ferro normalize "NM_000088.3:c.459del" --reference ferro-reference/use ferro_hgvs::{parse_hgvs, HgvsVariant};
fn main() -> Result<(), ferro_hgvs::FerroError> {
let variant = parse_hgvs("NM_000088.3:c.459A>G")?;
match &variant {
HgvsVariant::Cds(v) => println!("CDS variant: {}", v),
HgvsVariant::Genome(v) => println!("Genomic variant: {}", v),
_ => println!("Other: {}", variant),
}
Ok(())
}import ferro_hgvs
# Parse a variant
variant = ferro_hgvs.parse("NM_000088.3:c.459A>G")
print(variant.variant_type) # "coding"
print(variant.reference) # "NM_000088.3"
print(str(variant)) # "NM_000088.3:c.459A>G"
# Normalize with reference data
normalizer = ferro_hgvs.Normalizer(reference_json="ferro-reference/cdot.json")
normalized = normalizer.normalize("NM_000088.3:c.459del")| Type | Prefix | Example |
|---|---|---|
| Genomic | g. |
NC_000001.11:g.12345A>G |
| Coding DNA | c. |
NM_000088.3:c.459A>G |
| Non-coding | n. |
NR_000001.1:n.100A>G |
| RNA | r. |
NM_000088.3:r.459a>g |
| Protein | p. |
NP_000079.2:p.Val600Glu |
| Mitochondrial | m. |
NC_012920.1:m.3243A>G |
- Substitution:
A>G,Val600Glu - Deletion:
del,100_200del - Insertion:
100_101insATG - Deletion-Insertion:
100_102delinsATG - Duplication:
100_102dup - Inversion:
100_200inv - Repeat:
100CAG[20]
The ferro CLI provides commands beyond parsing and normalization:
| Command | Description |
|---|---|
prepare |
Download and prepare reference data for normalization |
check |
Verify reference data setup |
parse |
Parse and validate HGVS variants |
normalize |
Normalize HGVS variants (3'/5' shifting) |
explain |
Explain error/warning codes (e.g., ferro explain W1001) |
annotate-vcf |
Annotate VCF files with HGVS notation |
vcf-to-hgvs |
Convert VCF records to HGVS |
hgvs-to-vcf |
Convert HGVS to VCF format |
liftover |
Liftover coordinates between genome builds |
describe |
Generate HGVS from reference/observed sequences |
effect |
Predict protein effect from variant |
backtranslate |
Reverse translate protein to DNA variants |
convert-gff |
Convert GFF3/GTF to transcripts.json |
generate |
Generate HGVS descriptions from components |
extract-hgvs |
Extract HGVS from VEP-annotated VCFs |
ferro-hgvs provides configurable error handling with three modes:
| Mode | Behavior |
|---|---|
strict |
Reject non-conformant input (default) |
lenient |
Auto-correct with warnings |
silent |
Auto-correct silently |
# Use lenient mode to auto-correct common issues
ferro parse --error-mode lenient "p.val600glu" # Corrects to p.Val600Glu
# Ignore specific warnings
ferro parse --ignore W1001,W2001 "p.val600glu"
# Get help on any error/warning code
ferro explain W1001
ferro explain --listCreate .ferro.toml in your project directory:
[error-handling]
mode = "lenient"
ignore = ["W1001", "W2001"] # Silently correct these
reject = ["W3003"] # Always reject theseferro-hgvs provides the most comprehensive HGVS variant normalization across all pattern types, with performance orders of magnitude faster than alternatives.
| Pattern Type | ferro | mutalyzer | biocommons | hgvs-rs |
|---|---|---|---|---|
| Genomic (g.) | ✓ | ✓ | ✓ | ✓ |
| Coding (c.) exonic | ✓ | ✓ | ✓ | ✓ |
| Coding (c.) intronic | ✓ | ✓* | ✗ | ✗ |
| Non-coding (n.) | ✓ | ✓ | ✓ | ✓ |
| RNA (r.) | ✓ | ✓ | ✓ | ✓ |
| Protein (p.) | ✓ | Net** | ✗ | ✓ |
* mutalyzer intronic support requires genomic context rewriting (enabled by default) ** mutalyzer protein normalization requires network access for NP_→NM_ lookups
| Tool | Speed (local) | Speed (network) | ferro Speedup |
|---|---|---|---|
| ferro-hgvs | ~4M patterns/sec | N/A (offline) | — |
| mutalyzer | ~20 patterns/sec | ~1 pattern/sec | 200,000x |
| biocommons/hgvs | ~20 patterns/sec | ~0.2 patterns/sec | 200,000x |
| hgvs-rs | ~2 patterns/sec | ~0.2 patterns/sec | 2,000,000x |
The ferro prepare command downloads and organizes all reference data needed for comprehensive normalization. This data is then shared with other tools (mutalyzer, biocommons, hgvs-rs) to enable their local operation.
| Data Type | Source | Size | Enables |
|---|---|---|---|
| RefSeq transcripts | NCBI | ~1GB | NM_/NR_/XM_ normalization |
| cdot metadata | MANE | ~200MB | Transcript-to-genome mappings |
| GRCh38 + GRCh37 genomes | NCBI | ~4GB | NC_ genomic normalization |
| RefSeqGene | NCBI | ~600MB | NG_ gene region normalization |
| LRG sequences | EBI | ~50MB | LRG_ stable reference normalization |
| Protein sequences | Derived from CDS | ~200MB | NP_/XP_ protein normalization |
| Legacy transcript versions | NCBI | ~50MB | Historical ClinVar variants |
Key insight: Without ferro's reference preparation, other tools require network access for each variant lookup (adding 100-1000ms latency per variant). With ferro's cached reference data, all tools can operate fully offline with consistent, reproducible results.
The main ferro binary includes commands to prepare reference data (ferro prepare) and check its status (ferro check). The ferro-benchmark tool (build with --features benchmark) extends this for tool comparison benchmarks.
| Command | Description |
|---|---|
prepare <tool> |
Prepare reference data for a tool |
check <tool> |
Verify tool configuration and dependencies |
parse <tool> |
Parse HGVS patterns with specified tool |
normalize <tool> |
Normalize HGVS patterns with specified tool |
compare results |
Compare parse/normalize results between tools |
extract |
Extract patterns from ClinVar, VCFs, or create samples |
setup |
Set up UTA database, SeqRepo, and other services |
generate |
Generate summary reports and configs |
collate |
Aggregate sharded results |
# Prepare ferro reference (main binary - no special features needed)
ferro prepare --output-dir data/ferro
# Check reference data
ferro check --reference data/ferro
# Normalize with ferro
ferro normalize -i patterns.txt --reference data/ferro
# For tool comparison, build with benchmark support
cargo build --release --features benchmark
# Prepare other tools (uses ferro reference for transcript data)
ferro-benchmark prepare mutalyzer --ferro-reference data/ferro --output-dir data/mutalyzer
ferro-benchmark prepare biocommons --seqrepo-dir data/seqrepo --uta-dump uta_20210129b.pgd.gz --ferro-reference data/ferro
# Compare results between tools
ferro-benchmark normalize mutalyzer -i patterns.txt -o mutalyzer.json --mutalyzer-settings data/mutalyzer/mutalyzer_settings.conf
ferro-benchmark compare results normalize ferro.json mutalyzer.json -o comparison.jsonSupported tools: ferro-hgvs, mutalyzer, biocommons/hgvs, hgvs-rs
Note: The
pixi.tomlandpixi.lockfiles in this repository define a pixi environment for the Python-based external tools (mutalyzer, biocommons/hgvs, seqrepo) used in benchmarking. Runpixi shellto activate it.
See docs/BENCHMARK_GUIDE.md for detailed usage.
cargo build
cargo test
cargo clippy -- -D warningsLicensed under the MIT License. See LICENSE for details.
This software is under active development. While we make a best effort to test this software and to fix issues as they are reported, this software is provided as-is without any warranty (see the license for details). Please submit an issue, and better yet a pull request as well, if you discover a bug or identify a missing feature. Please contact Fulcrum Genomics if you are considering using this software or are interested in sponsoring its development.
See CONTRIBUTING.md for guidelines.