Code for HeLa-Mem on LongMemEval.
LoCoMo evaluation code will be released in the future.
The table below shows the target reproduce results of HeLa-Mem on LongMemEval-S on the full 500-item benchmark:
| Method | Overall ACC |
|---|---|
| LangMem | 37.20 |
| MemoryOS | 44.80 |
| Mem0 | 53.61 |
| FullText | 56.80 |
| NaiveRAG | 61.00 |
| A-MEM | 62.60 |
| HeLa-Mem (Ours) | 65.40 |
HeLa-Mem/
├── hela_mem/
│ ├── encode_longmemeval.py
│ ├── eval_longmemeval.py
│ ├── hebbian_knowledge_memory.py
│ ├── hebbian_memory.py
│ ├── hebbian_retriever.py
│ ├── profile_utils.py
│ ├── reranker.py
│ └── utils.py
├── scripts/
│ ├── encode_longmemeval.sh
│ └── eval_longmemeval.sh
├── pyproject.toml
└── requirements.txt
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtConfigure your API access:
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"If you want multi-key rotation, provide:
export OPENAI_API_KEYS="key1,key2,key3"or
export OPENAI_API_KEYS_FILE="/path/to/keys.txt"The default model is gpt-4o-mini.
Expected fields per item:
question_idquestionanswerquestion_typequestion_datehaystack_dateshaystack_sessions
The complete 500-item LongMemEval-S file is bundled in this repository:
The original LongMemEval experiment here is two-stage:
encode_longmemeval.pyeval_longmemeval.py
Encoding builds:
*_hebbian.json*_long_term.json*_long_term_kb_graph.json
Evaluation does:
- episodic retrieval
- semantic retrieval
- answer generation
- GPT judge scoring
- per-item result saving
- summary aggregation
Configure standard OpenAI credentials:
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"Then run the full 500-item experiment.
bash scripts/encode_longmemeval.shOr directly:
python -m hela_mem.encode_longmemeval \
--data_path data/longmemeval_s.json \
--output_dir results/longmemeval_mem_full \
--workers 8bash scripts/eval_longmemeval.shOr directly:
python -m hela_mem.eval_longmemeval \
--data_path data/longmemeval_s.json \
--mem_dir results/longmemeval_mem_full \
--workers 8 \
--top_k 15 \
--semantic_top_k 5Outputs are written under:
results/.../eval_results/result_<question_id>.jsonresults/.../eval_results/eval_summary.json
If you want a smaller sanity-check run, keep the same dataset file and add --num_items 100 or another cap to both encode and eval.
- This release keeps the original experiment-style environment variable names (
HEBBIAN_*) so existing commands map cleanly. - API-key rotation is still supported, but keys must now come from environment variables or a local keys file.
- The code uses the standard OpenAI Python SDK request pattern (
client.chat.completions.create) withOPENAI_API_KEYand the official OpenAI base URL by default. - The repository has been cleaned for release, but the LongMemEval path is kept source-aligned rather than simplified.