This paper introduces a novel approach to enhance document memory in large language models (LLMs) through guided learning mechanisms. We propose a document-wise memory selection framework that enables models to selectively memorize and retrieve document-specific information using learnable document representations and guidance loss functions.
- Framework: Novel document-wise memory selection mechanism
- Guidance Loss: Innovative guidance-based training approach for document memory
- Visualization: Comprehensive analysis of memory selection patterns
- Evaluation: Extensive experiments across multiple model architectures
Large language models can be enhanced with document-wise memory selection using learnable document representations and guidance loss functions to improve document memorization and retrieval capabilities.
DocGuidanceLLM/
├── foundations/ # Model foundation implementations
│ ├── llama2.py # Llama2 model utilities
│ └── pythia.py # Pythia model utilities
├── document_memories.py # Document memory implementation
├── hook_lm.py # Language model hooking utilities
├── train_guidance.py # Main training script
├── utils.py # Utility functions and memory selection
├── wikitext.py # WikiText dataset processing
├── run.sh # Experiment runner script
├── requirements.txt # Python dependencies
└── README.md
The following models are supported in foundations/:
- Llama2: Various sizes through
llama2.py - Pythia: Various sizes through
pythia.py
# Run the main training experiment
bash run.sh
# Or run with custom parameters
python train_guidance.py \
--lm_name pythia \
--lm_size 1b \
--num_gpus 1 \
--max_labels 10 \
--segment_length 128 \
--max_segements 10 \
--max_length 256 \
--lr 1e-3 \
--batch_size 16 \
--num_epochs 500 \
--hook_memory_dim 32 \
--hook_memory_layer 15 \
--key_dim 2 \
--key_activation tanh \
--guidance 0.1Edit the parameters in run.sh to customize your experiments:
# --- LLM Related ---
lm_name=pythia
lm_size=1b
num_gpus=1
# --- Document Memory Related ---
key_dim=2 # dimension of random document representation
key_activation=tanh # inductive bias of document memory selection
hook_memory_dim=32 # how many memories
hook_memory_layer=15 # location of the memory
guidance=0.1 # alpha (guidance parameter)Please see utils.py for the implementation of memory selection.
Visualization of ReLU Activation

Visualization of Tanh Activation

If you find this work useful, please cite our paper:
@inproceedings{park2024document,
title={Memorizing Documents with Guidance in Large Language Models},
author={Park, Bumjin and Choi, Jaesik},
booktitle={Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI)},
year={2024}
}Key dependencies include:
- PyTorch
- Transformers
