LLama Trainer

Custom setup for training llama. Can't afford to use huggingface format as I need pytorch flexibility for future endevors with these weights.

Trying to make this as flexible/simple as possible for reuse

Weight Sync

Taking advantage of the parallelism, I sync weights as follows in traininglib/gradient_updates.py

Initialize divisor as 2
if item index is not divisible by divisor, send gradients to device_num // divisor
add the two gradients
repeat until we get to one one gradient remaining
average, and send everything back

Note that the accumulation process here takes log(num_gpu) iterations, which is important because gpu-gpu transfer is expenseive

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configure		configure
llama		llama
llama_datasets		llama_datasets
training_lib		training_lib
utils		utils
Notebook.ipynb		Notebook.ipynb
finetune_llama.py		finetune_llama.py
readme.md		readme.md
tester.ipynb		tester.ipynb