Transformer Implementations

This repository contains Transformer model implementations in Rust and PyTorch, based on the paper Attention is All You Need. Furthermore, we provide code to reproduce results using the original Tensor2Tensor (T2T) implementation. A report can be found under https://mkleinegger.github.io/transforming-attention/report.pdf

Dataset

To ensure consistent sentence embeddings across all implementations, we provide tokenized translation datasets in Parquet format and a vocabulary file (vocab.ende) in the data/ directory. The dataset was generated using Tensor2Tensor (T2T), the original implementation of the Attention is All You Need paper, with subword tokenization applied to the translate_ende_wmt32k dataset.

Each dataset file contains sentence pairs, where:

inputs (English sentence tokens)
targets (German translation tokens)

Example structure of the dataset:

shape: (45_782, 2)
┌───────────────────┬────────────────────┐
│ inputs            ┆ targets            │
│ ---               ┆ ---                │
│ list[i64]         ┆ list[i64]          │
╞═══════════════════╪════════════════════╡
│ [5374, 8907, … 1] ┆ [2606, 12727, … 1] │
│ [29, 379, … 1]    ┆ [1096, 10, … 1]    │
│ [124, 6618, … 1]  ┆ [111, 16146, … 1]  │
│ [316, 25, … 1]    ┆ [806, 103, … 1]    │
│ [75, 8664, … 1]   ┆ [168, 4021, … 1]   │
│ …                 ┆ …                  │
└───────────────────┴────────────────────┘

Structure

rust/ - Transformer implementation in Rust.
pytorch/ - Transformer implementation in PyTorch.
jax/ - Transformer implementation in JAX.
t2t/ - Transformer implementation from Tensor2Tensor.
data/ - Tokenized datasets and vocabulary file.
examples/ - Example scripts.

Running

Use create_env.sh to create an environment. After activating it, the code should run with just python script.py. If you want to use multi-gpu just run it with torchrun torchrun --nproc-per-node=num_nodes train.py and torchrun torchrun --nproc-per-node=num_nodes predict.py

After training and predicting just use bleu.py to compute the result. Before running just check that you load the right checkpoints for each file, before each step.

References

Attention is All You Need
Tensor2Tensor (T2T)

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
docs		docs
examples		examples
pytorch		pytorch
rust		rust
t2t		t2t
.gitignore		.gitignore
README.md		README.md
create_env.sh		create_env.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Implementations

Dataset

Structure

Running

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer Implementations

Dataset

Structure

Running

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages