Skip to content

Latest commit

 

History

History
217 lines (156 loc) · 10.7 KB

File metadata and controls

217 lines (156 loc) · 10.7 KB

RL-NPO: Reinforcement Learning via Neural Prompt Optimization

Rewrite any text to mathematically maximize specific human cognitive states using Meta's TRIBE v2 as a biologically-grounded reward function.

What It Does

RL-NPO takes a piece of text and a cognitive target (e.g., memory, emotion, attention), then runs an evolutionary RL loop that rewrites the text until it produces brain activation patterns — as predicted by TRIBE v2 — that match a pre-computed empirical target derived from known high-impact stimuli.

What makes this novel: No existing tool uses a brain encoding model as a reward function for text optimization. TRIBE v2's modality dropout during training means it produces meaningful text-only predictions even though it was trained on multimodal (video+audio+text) brain data. Unlike LLM-as-a-judge approaches, TRIBE v2 cannot be gamed by fluent but shallow text — it scores against a fixed cortical activation target derived from human fMRI data.

Two Use Cases

1. Neural Prompt Optimization

Rewrite any text to shift its cognitive register toward a target style — preserving all factual content while restructuring rhythm, pacing, and narrative sequencing to match the brain activation signature of a reference stimulus (a Carl Sagan passage, a film scene, a piece of writing you want to emulate).

2. Small-to-Large LLM Bridging (Test-Time Compute)

Use TRIBE v2 as a verifier in a best-of-N selection loop. A small, cheap LLM (Llama 3.1 8B) generates N candidate outputs; TRIBE v2 selects the one that most closely matches the neural signature of a frontier model's (GPT-5.3) output on the same prompt. No fine-tuning required.


Architecture

Input Text → OpenRouter LLM (mutations) → TRIBE v2 (brain prediction) → ROI Scoring → Selection
     ↑                                                                                      ↓
     └──────────────────────── Best Candidate ←────────────────────────────────────────────────┘

Each generation:

  1. The mutator LLM generates N candidate rewrites preserving factual content
  2. Each candidate runs through TRIBE v2 → cortical activation prediction (20,484 vertices)
  3. Predictions are masked to the target ROI (e.g., parahippocampal for memory)
  4. Cosine similarity to the empirical target vector = reward score
  5. Best candidate becomes the parent for the next generation

Results

Use Case 1: Neural Prompt Optimization

Target: Carl Sagan's Pale Blue Dot passage as the reference neural signature.

Run: CS/Systems text → Sagan register

Metric Value
Input domain Computer science (garbage collector / memory management)
Mutator Claude 3.5 Haiku
Mutations per generation 3
Baseline score 0.8858
Peak score 0.9613
Δ +0.0755
Generations to peak 2 (sustained climb across 3 consecutive generations)

Before (input):

The garbage collector traverses object reference graphs using tri-color marking. Heap fragmentation increases allocation latency when free block coalescing fails. Stack frames are deallocated deterministically at function return boundaries, unlike heap objects which require explicit or automated reclamation cycles.

After (RL-NPO output):

Within the computational ecosystem of memory management, garbage collection algorithms orchestrate a complex ballet across referential networks, utilizing sophisticated tri-color graph traversal techniques. Heap structures unveil fundamental structural challenges where fragmentation impedes efficient block consolidation... Juxtaposed against this complexity, stack memory performs with clockwork regularity, immediately dissolving computational frames post-function execution.

The system discovered the Sagan register without being told what it was — metaphorical framing ("ballet", "clockwork regularity"), narrative arc (complexity → contrast → resolution), and contemplative scale emerged purely from TRIBE v2's cosine similarity signal over the cortical surface.


Use Case 2: Small-to-Large LLM Bridging (Test-Time Compute)

Target: GPT-5.3 Instant's output on a Balance of Payments economics prompt. Proposer: Llama 3.1 8B.

Run: ttco_10x5 — best configuration

Metric Value
Input Academic economics text (Llama 3.1 8B baseline output)
Mutator Llama 3.1 8B (via OpenRouter)
Mutations per generation 5
Baseline score 0.9569
Peak score 0.9897
Δ +0.0328
Generations to peak 0 (best-of-5 in first generation)
Gap closed vs frontier ~78% of the distance between Llama baseline and GPT-5.3 ceiling

Frontier reference (GPT-5.3 Instant):

The Balance of Payments is the master ledger tracking every economic deal between a country and the rest of the world. In a freely floating exchange rate system, a persistent current account deficit acts like pressure building in a system — it pushes the domestic currency downward... this currency shift becomes a built-in stabilizer, steadily pulling trade back toward equilibrium.

Llama 8B baseline (unoptimized):

The Balance of Payments records all economic transactions between a country and the rest of the world. Under a freely floating exchange rate regime, a persistent current account deficit exerts downward pressure on the domestic currency's valuation...

RL-NPO output (Llama 8B + TRIBE v2 selection):

The Balance of Payments offers a comprehensive overview of a country's interactions with the global economy. As long as a country operates under a floating exchange rate regime, any current account deficit exerts downward pressure on its currency value. According to the Marshall-Lerner condition, which suggests that the combined price elasticity of export and import demand exceeds one, the downward pressure will eventually lead to increased demand for domestic exports and decreased demand for foreign imports until trade equilibrium is achieved.

All factual content is preserved. The output shifted from academic register toward accessible, flow-oriented explanatory prose — matching GPT-5.3's communicative style without fine-tuning the underlying model.

Why TRIBE v2 outperforms LLM-as-a-judge for this task:

  • No sycophancy or positional bias — scores against fixed cortical targets derived from fMRI, not learned preferences
  • Out-of-distribution signal — biological reward cannot be gamed by text that merely "sounds good" to a language model
  • Zero additional training required for new domains

5 Cognitive Targets

Target Brain Regions What It Optimizes
memory Parahippocampal + Subparietal Episodic encoding, narrative retention
emotion Middle frontal + Orbital + Superior temporal Emotional salience, arousal
attention DLPFC + Frontomargin + Paracentral Working memory, focused processing
language Broca's area + Inferior frontal sulcus + Transverse temporal Syntactic/semantic depth
narrative Posterior cingulate + Precuneus + Angular gyrus DMN suppression = being "hooked"

Quick Start

1. Setup

# Create conda environment
conda create -n rlnpo python=3.11
conda activate rlnpo

# CRITICAL: pin numpy first — TRIBE v2 breaks silently on numpy>=2.1
pip install "numpy<2.1"

# Install TRIBE v2 from source
pip install -e ./tribev2

# Install RL-NPO dependencies
pip install -r requirements.txt

# Set your OpenRouter API key
echo OPENROUTER_KEY="sk-or-..." > .env

2. Build Target Vectors (one-time, ~5 min per target)

python cli.py --build-targets

3. Optimize Text

# Named cognitive target
python cli.py \
  --text "Black holes create gravitational singularities that bend spacetime..." \
  --target memory \
  --generations 8 \
  --mutations 5 \
  --run-id "blackhole_memory_v1"

# Custom reference stimulus (your own .npy target vector)
python cli.py \
  --text-file examples/paper_excerpt.txt \
  --custom-target targets/carl_sagan.npy \
  --generations 10 \
  --mutations 5

4. View Results

python cli.py --list-targets

# Results saved to runs/{run_id}/
#   ├── run_config.json     # Full configuration snapshot
#   ├── telemetry.jsonl     # Per-generation details with all candidate scores and deltas
#   ├── result.json         # Final result: before/after text, score delta, reward curve
#   └── reward_curve.png    # Score progression plot

Project Structure

rl-npo/
├── agent.py              # Mutation loop: OpenRouter API + evolutionary selection
├── reward_engine.py      # TRIBE v2 wrapper: text → cortical prediction → ROI score
├── target_builder.py     # One-time: generates + caches empirical target vectors
├── roi_masks.py          # Destrieux atlas → fsaverage5 boolean masks per ROI
├── telemetry.py          # JSONL logger + reward curve plotter
├── cli.py                # Entrypoint: argparse CLI
├── targets/              # Cached .npy target vectors (one per ROI)
├── runs/                 # Output directory, one subfolder per run
├── requirements.txt
└── README.md

Key Technical Details

  • TRIBE v2 modality dropout: Trained with each modality zeroed at p=0.3. Text-only inference produces biologically meaningful cortical predictions — not fallback noise.
  • Text → TTS → Whisper pipeline: TRIBE v2 requires word-level timing events. Text inputs are synthesized to speech via gTTS, then re-transcribed by WhisperX to produce the 2Hz word timing grid.
  • fsaverage5 surface: All predictions live on the fsaverage5 cortical mesh (20,484 vertices). ROI masks are derived from the Destrieux atlas via nilearn — no external downloads required beyond pip.
  • Cosine similarity scoring: Reward = cosine similarity between the ROI-masked candidate activation vector and the precomputed target vector.
  • Anti-reward-hacking mutation prompt: The mutator is explicitly instructed to avoid adjective stuffing and vocabulary inflation — a known exploit that inflates neural similarity scores without meaningful cognitive shift.
  • GPU memory management: _cleanup_gpu() is called after every TRIBE v2 inference to prevent VRAM accumulation over long runs on 8GB GPUs.

Citation

@article{dAscoli2026TribeV2,
  title={A foundation model of vision, audition, and language for in-silico neuroscience},
  author={d'Ascoli, Stéphane and Rapin, Jérémy and others},
  year={2026}
}

License

Research use only. TRIBE v2 is licensed under CC-BY-NC-4.0.