Skip to content

ReinerBRO/HeLa-Mem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HeLa-Mem

Code for HeLa-Mem on LongMemEval.

LoCoMo evaluation code will be released in the future.

LongMemEval Result

The table below shows the target reproduce results of HeLa-Mem on LongMemEval-S on the full 500-item benchmark:

Method Overall ACC
LangMem 37.20
MemoryOS 44.80
Mem0 53.61
FullText 56.80
NaiveRAG 61.00
A-MEM 62.60
HeLa-Mem (Ours) 65.40

Included Code

HeLa-Mem/
├── hela_mem/
│   ├── encode_longmemeval.py
│   ├── eval_longmemeval.py
│   ├── hebbian_knowledge_memory.py
│   ├── hebbian_memory.py
│   ├── hebbian_retriever.py
│   ├── profile_utils.py
│   ├── reranker.py
│   └── utils.py
├── scripts/
│   ├── encode_longmemeval.sh
│   └── eval_longmemeval.sh
├── pyproject.toml
└── requirements.txt

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure your API access:

export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"

If you want multi-key rotation, provide:

export OPENAI_API_KEYS="key1,key2,key3"

or

export OPENAI_API_KEYS_FILE="/path/to/keys.txt"

The default model is gpt-4o-mini.

Dataset Format

Expected fields per item:

  • question_id
  • question
  • answer
  • question_type
  • question_date
  • haystack_dates
  • haystack_sessions

The complete 500-item LongMemEval-S file is bundled in this repository:

Experiment Entry Points

The original LongMemEval experiment here is two-stage:

  1. encode_longmemeval.py
  2. eval_longmemeval.py

Encoding builds:

  • *_hebbian.json
  • *_long_term.json
  • *_long_term_kb_graph.json

Evaluation does:

  • episodic retrieval
  • semantic retrieval
  • answer generation
  • GPT judge scoring
  • per-item result saving
  • summary aggregation

Reproduce

Configure standard OpenAI credentials:

export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"

Then run the full 500-item experiment.

1. Encode

bash scripts/encode_longmemeval.sh

Or directly:

python -m hela_mem.encode_longmemeval \
  --data_path data/longmemeval_s.json \
  --output_dir results/longmemeval_mem_full \
  --workers 8

2. Evaluate

bash scripts/eval_longmemeval.sh

Or directly:

python -m hela_mem.eval_longmemeval \
  --data_path data/longmemeval_s.json \
  --mem_dir results/longmemeval_mem_full \
  --workers 8 \
  --top_k 15 \
  --semantic_top_k 5

Outputs are written under:

  • results/.../eval_results/result_<question_id>.json
  • results/.../eval_results/eval_summary.json

If you want a smaller sanity-check run, keep the same dataset file and add --num_items 100 or another cap to both encode and eval.

Notes

  • This release keeps the original experiment-style environment variable names (HEBBIAN_*) so existing commands map cleanly.
  • API-key rotation is still supported, but keys must now come from environment variables or a local keys file.
  • The code uses the standard OpenAI Python SDK request pattern (client.chat.completions.create) with OPENAI_API_KEY and the official OpenAI base URL by default.
  • The repository has been cleaned for release, but the LongMemEval path is kept source-aligned rather than simplified.

About

HeLa-Mem code for LongMemEval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors