doc(memory): add documentation for token based memory and summarization by sahiltyagiii · Pull Request #236 · juspay/neurolink

sahiltyagiii · 2025-11-17T11:20:17Z

Pull Request

Description

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🧹 Code refactoring (no functional changes)
⚡ Performance improvement
🧪 Test coverage improvement
🔧 Build/CI configuration change

Changes Made

Documentation added

AI Provider Impact

Component Impact

Performance Impact

No performance impact
Performance improvement
Minor performance impact (acceptable)
Significant performance impact (needs discussion)

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published

Summary by CodeRabbit

New Features
- Introduced token-based conversation memory with per-session thresholds (default 50k tokens)
- Automatic background summarization of conversations when memory limits are reached
- Per-session memory configuration customization for different use cases
- Complete conversation history preservation with improved context tracking

coderabbitai · 2025-11-17T11:20:29Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request introduces a comprehensive design plan for token-based conversation memory management. It replaces turn-count logic with token thresholds, implements non-destructive pointer-based context tracking, defines token counting infrastructure with multiple providers, and specifies background summarization workflows with edge-case mitigations.

Changes

Cohort / File(s)	Summary
Design & Planning Documentation `memory-bank/TOKEN-BASED-MEMORY-IMPLEMENTATION-PLAN.md`	Comprehensive design specification outlining token-based memory architecture, including token thresholds, pointer-based context model, summarization workflows, edge-case handling, performance considerations, and directory layout proposals.
Core Conversation Types `src/lib/types/conversation.ts`	Extends `ChatMessage` and `SessionMemory` types with fields for IDs (`id`), summarization metadata (`isSummary`, `summarizesFrom`, `summarizesTo`), message tracking (`truncated`, `originalTokens`), and session-level token management (`summarizedUpToMessageId`, `tokenThreshold`, `lastTokenCount`, `lastCountedAt`).
Message ID Utilities `src/lib/utils/messageId.ts`	Adds `generateMessageId()` and `generateSummaryId()` functions for nanoid-based URL-safe ID generation with prefix convention (`summary-`).
Token Counting Service Infrastructure `src/lib/services/tokenCounting/types.ts`, `src/lib/services/tokenCounting/*`	Defines `TokenCounter` interface with `countTokens()`, `supportsAsync()`, and `supportedProviders()` methods; specifies `TokenCountInput` and `TokenCountResult` types; establishes provider-based architecture with caching layer (5-min TTL) for latency optimization.
Configuration & Defaults `src/lib/types/configuration.ts`, `src/lib/config/conversationMemory.ts`	Introduces `ConversationMemoryConfig` with `tokenThreshold`, `summarizationProvider`, `summarizationModel` fields; establishes default values (`DEFAULT_TOKEN_THRESHOLD`, `MAX_SINGLE_MESSAGE_TOKENS`, `DEFAULT_SESSION_INACTIVITY_DAYS`); supports per-session overrides.
Supporting Utilities `src/lib/utils/`, `src/lib/types/`	Adds caching utilities, message normalization helpers, pointer/context utilities, and validation patterns for non-destructive memory operations.

Sequence Diagrams

sequenceDiagram
    participant App as Application
    participant Memory as Session Memory
    participant TokenSvc as Token Counter
    participant Summarizer as Summarization Service
    participant Cache as Cache (5min TTL)

    rect rgb(200, 220, 240)
    note over App,Cache: Background Summarization Flow
    App->>Memory: Check token count
    Memory->>TokenSvc: countTokens(messages, model)
    TokenSvc->>Cache: Check cached count
    alt Cache hit
        Cache-->>TokenSvc: Return cached result
    else Cache miss
        TokenSvc->>TokenSvc: Call provider (API or client)
        TokenSvc->>Cache: Store result with TTL
    end
    TokenSvc-->>Memory: TokenCountResult
    Memory->>Memory: threshold exceeded?
    alt Yes
        Memory->>Summarizer: Split messages & generate summary
        Summarizer-->>Memory: Summary message (isSummary=true)
        Memory->>Memory: Update summarizedUpToMessageId pointer
        Memory->>Memory: Insert summary in message list
    end
    end

sequenceDiagram
    participant App as Application
    participant Memory as Session Memory
    participant LLM as LLM Provider

    rect rgb(220, 240, 200)
    note over App,LLM: Context Building with Pointers
    App->>Memory: Build context for inference
    Memory->>Memory: Get summarizedUpToMessageId pointer
    alt Pointer exists
        Memory->>Memory: Fetch summary message (isSummary=true)
        Memory->>Memory: Collect messages after pointer
    else No pointer
        Memory->>Memory: Use all messages
    end
    Memory->>Memory: Validate context within tokenThreshold
    Memory-->>App: Complete context list
    App->>LLM: Send (system prompt + context + new message)
    LLM-->>App: Response
    App->>Memory: Append assistant message + user message
    Memory->>Memory: Update lastTokenCount, lastCountedAt
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Design complexity: Document proposes significant architectural shift from turn-count to token-based tracking; requires understanding of token counting providers, summarization triggers, and pointer-based memory model.
Scope & integration: Changes span multiple subsystems (types, utilities, services, configuration) with interdependencies; backward compatibility concerns with existing sessions noted.
Areas requiring extra attention:
- Pointer corruption edge-case and recovery patterns (potential data inconsistency risks)
- API failure handling in token counting and summarization workflows
- Cache invalidation logic with 5-minute TTL implications
- Per-session override mechanism interaction with global thresholds
- Single large message truncation strategy and its impact on context quality

Possibly related PRs

feat(SDK): Integrate context summarization with conversation memory BZ-43344 #89: Modifies the conversation memory and summarization subsystem configuration, session/message types, and memory manager workflows; directly related to token-based memory integration.
feat(cli): expose memory commands to cli from sdk #155: Updates imports and adds CLI commands that depend on conversation types (ChatMessage, SessionMemory) proposed in this design.

Suggested reviewers

murdore

Poem

🐰 Tokens flow like river streams,
Pointers mark where memory gleams,
Summaries bloom, no loss, no waste,
Context preserved, forever traced! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: adding documentation for token-based memory and summarization. It is concise, clear, and follows conventional commit format.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull Request Overview

This PR adds comprehensive documentation for implementing a token-based memory system as an improvement over the current turn-based conversation memory approach. The documentation outlines critical issues with the existing implementation and proposes a detailed solution using token counting, pointer-based context management, and non-destructive message storage.

Key Changes

Detailed problem statement identifying critical failures in the current turn-based memory system
Complete architecture design for token-based memory with provider-specific token counting implementations
Code examples demonstrating message validation, pointer-based context building, and summarization processes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-17T11:25:59Z

+    enableSummarization: process.env.NEUROLINK_SUMMARIZATION_ENABLED === "true",
+    tokenThreshold: Number(process.env.NEUROLINK_TOKEN_THRESHOLD) || DEFAULT_TOKEN_THRESHOLD,
+    summarizationProvider: (process.env.NEUROLINK_SUMMARIZATION_PROVIDER as AIProviderName) || "vertex",
+    summarizationModel: process.env.NEUROLINK_SUMMARIZATION_MODEL || "gemini-2.5-flash",


The default model 'gemini-2.5-flash' is inconsistent with the codebase. The actual default in src/lib/config/conversationMemory.ts uses 'gemini-2.5-flash', but many documentation files reference 'gemini-2.0-flash' models, and the model 'gemini-2.0-flash' (without version suffix) doesn't appear to exist in the codebase. Consider using 'gemini-2.0-flash-001' or clarifying if 'gemini-2.5-flash' is the intended default.

Suggested change

summarizationModel: process.env.NEUROLINK_SUMMARIZATION_MODEL || "gemini-2.5-flash",

summarizationModel: process.env.NEUROLINK_SUMMARIZATION_MODEL || "gemini-2.0-flash-001",

yes, 2.5 flash will be used

Copilot · 2025-11-17T11:26:00Z

+NEUROLINK_SUMMARIZATION_PROVIDER=vertex
+NEUROLINK_SUMMARIZATION_MODEL=gemini-2.5-flash
+```
+


This environment variable example uses 'gemini-2.5-flash' as the default, which matches the code implementation. However, ensure this is the intended model since much of the documentation uses 'gemini-2.0-flash' variants. If 'gemini-2.5-flash' is correct, consider adding a note explaining why this newer model is preferred over the 2.0 series.

Suggested change

> **Note:** The default summarization model is set to `gemini-2.5-flash` (not `gemini-2.0-flash` as referenced in some older documentation). This newer model is preferred due to improved performance, accuracy, and support for larger context windows. Ensure your environment supports `gemini-2.5-flash` before deploying.

yes, 2.5 flash will be used

Copilot · 2025-11-17T11:26:00Z

@@ -0,0 +1,828 @@
+# Token-Based Conversation Memory Implementation Plan
+
+**Date:** 2025-11-17


The date '2025-11-17' appears to be in the future (the current date is November 2025, so this would be November 17, 2025). If this document was created earlier, this date might be incorrect. Verify the intended date format and value.

Suggested change

**Date:** 2025-11-17

**Date:** 2025-11-01

date is correct

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

memory-bank/TOKEN-BASED-MEMORY-IMPLEMENTATION-PLAN.md (5)

398-422: Token counting and summarization trigger logic is sound, but helper methods are not shown.

The flow correctly counts tokens from the pointer onwards and triggers summarization when threshold is exceeded. However, three critical methods are referenced but not defined: _findMessageIndex(), _findSplitIndexByTokens(), and _generateSummary().

Include method signatures and pseudocode/logic for these in the implementation plan, or defer their definition to follow-up issues.

Would you like me to draft method signatures and pseudocode for these three helpers?

544-584: Configuration schema is comprehensive, but environment variable naming lacks consistency.

The config schema mixes two naming conventions:

NEUROLINK_TOKEN_THRESHOLD (new style: explicit noun)

NEUROLINK_MEMORY_MAX_TURNS_PER_SESSION (old style: prefixed with subsystem)

For consistency, recommend:

New vars: NEUROLINK_MEMORY_TOKEN_THRESHOLD and NEUROLINK_MEMORY_SESSION_INACTIVITY_DAYS

Or document the naming rationale for reviewers.

Also, line 575 reads DEFAULT_TOKEN_THRESHOLD from env but the constant is defined on line 564. Clarify: is the env var meant to override the constant, or does the constant serve as a fallback?

697-771: Performance projections are realistic but presented as estimates; memory overhead may be underestimated.

The latency breakdown (lines 702–717) provides useful context, but the 1–200ms validation range and 2–400ms total blocking time are highly dependent on token counter provider choice and network conditions. Clearly label these as best/average/worst-case scenarios rather than guarantees.

The 80% LLM payload reduction (lines 765–768) is a helpful hypothetical, but note that this assumes a specific message distribution (100 messages → 20 recent). Real-world effectiveness depends on conversation patterns.

Memory overhead underestimated: Line 770 claims "~100 bytes per session," but the new fields include:

summarizedUpToMessageId: 21 bytes (nanoid)

tokenThreshold: 4–8 bytes

lastTokenCount: 4–8 bytes

lastCountedAt: 8 bytes

Per message: id (21 bytes) + metadata object (40–100 bytes minimum)

A session with 100 messages could see 4–6 KB additional overhead, not 100 bytes. Revise the estimate.

774-802: Environment variables are well-defined, but naming convention needs consistency check.

New variables are clearly documented (lines 778–782), and deprecated vars are marked for backward compatibility (lines 786–791). However, review the naming convention consistency across all config variables:

New: NEUROLINK_TOKEN_THRESHOLD (no subsystem prefix)

Old: NEUROLINK_MEMORY_MAX_TURNS_PER_SESSION (subsystem-prefixed)

Recommend either (a) prefixing all new vars with _MEMORY_, or (b) documenting why the new vars omit the subsystem prefix. This prevents future confusion when developers add more environment variables.

806-827: Success criteria are well-defined and testable, but one criterion may be unachievable with certain token counters.

Criteria 1–3, 5–6 are clearly measurable and map to the original problem statements. However:

Criterion 4 (Line 811): "<200ms overhead for token counting (with caching)" — This assumes cache hits or client-side token counters. API-based providers (Google, Anthropic, Bedrock) without cache hits can take 100–200ms per call. On first interaction or after cache expiration, this criterion may not be met.

Recommend revising to: "<200ms overhead for token counting on cache hits; ≤2s for cache misses, with automatic fallback to estimation to prevent exceeding context window."

Also, the related documentation section (lines 817–823) references external docs (/docs/CONVERSATION-MEMORY.md, etc.) that may need updates or deprecation notices when this implementation is merged. Document this dependency for the reviewer/implementer.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60c35e0 and 92b28ee.

📒 Files selected for processing (1)

memory-bank/TOKEN-BASED-MEMORY-IMPLEMENTATION-PLAN.md (1 hunks)

🔇 Additional comments (8)

memory-bank/TOKEN-BASED-MEMORY-IMPLEMENTATION-PLAN.md (8)

8-67: Problem statement is clear and well-motivated; code references need verification.

The four issues (single long messages, turn-based ignoring tokens, destructive deletion, inflexible per-session config) are well-articulated with concrete scenarios. However, Line 52's reference to "line 184-187" in the existing implementation cannot be verified from the provided context. Once this document is applied, ensure that these references point to actual implementation locations.

72-101: Architecture design is sound and principles are well-aligned.

The pointer-based model with non-destructive storage elegantly addresses the problems stated. New fields on SessionMemory are justified and the principles map cleanly to the identified issues.

225-285: Type definitions are well-structured, but ID generation rationale needs documentation for future reference.

The ChatMessage and SessionMemory type updates cleanly capture the new pointer-based design. The nanoid choice over UUID is justified (57% smaller, URL-safe), which is good for storage efficiency. This rationale should be preserved in code comments when implemented.

294-342: Message validation truncation logic is pragmatic, but doesn't guarantee total context fit.

Truncating messages at 80% of threshold prevents individual message overflow, but doesn't guarantee that after adding the system prompt, tools, and other messages, the total stays within the window. The validation only checks the individual message; later checks (lines 403–414) catch overall overflow.

Document this as an intentional two-stage check: (1) per-message safety, (2) total-context overflow detection. Ensure downstream context-building also includes system prompts in token counts.

430-465: Pointer-based context building is clear and includes good fallback logic.

The function correctly handles missing pointers (returns all messages) and logs warnings for corrupted pointers. The slice-based extraction is efficient.

One note: Ensure system prompts are included when counting tokens for this context, as the TokenCountInput interface (line 190–194) accepts an optional systemPrompt parameter.

468-537: Summarization workflow is well-designed, but lacks post-summarization validation.

The split-point logic (keep 30% of threshold as recent context) is reasonable. However, after inserting the summary message (line 525), there is no re-validation that the new total token count (summary + recent messages) stays within threshold. If the summary itself is large, the context could exceed the threshold again.

Recommend adding a token recount after summary insertion and a fallback to truncate the summary if needed.

601-693: Edge case mitigations are comprehensive, but several implementation details are unclear.

The six edge cases (single message, API failure, summary failure, summary too long, pointer corruption, cache inconsistency) are realistic and well-motivated. However:

Line 625: estimateTokens(messages) is called as a fallback, but it's unclear if this function exists in the codebase (mentioned as existing TokenUtils on line 155, but not shown).

Line 637: When summarization fails, you log an error but don't clarify whether session.summarizedUpToMessageId is updated. If not updated, re-attempts will retry the same work. If updated anyway, the pointer advances without a summary. Specify the recovery behavior.

Lines 651–658: Summary truncation uses 20% threshold (line 652), but the summarization split logic uses 30% (line 492). Document whether these percentages are intentionally different.

Lines 685–692: Cache invalidation by clearing lastCountedAt could race with in-flight token counting requests. If a request completes after invalidation but before the flag is read, stale data may be used. Document whether serialization/locking is needed.

1-827: Overall: Comprehensive design with strong fundamentals; implementation details need clarification before coding.

This is a well-structured plan that addresses all four critical problems with the current turn-based system. The pointer-based, non-destructive approach is sound, and the token-counting infrastructure is thoroughly thought through.

Key strengths:

Clear problem statements with real-world scenarios

Non-destructive storage design eliminates audit concerns

Per-session configurability addresses flexibility gaps

Comprehensive edge-case mitigations

Key gaps before implementation:

Undefined methods: _findMessageIndex(), _findSplitIndexByTokens(), _generateSummary() need detailed specs

Anthropic SDK version: Verify @anthropic-ai/sdk ^0.30.0 supports token counting API

Post-summarization validation: No re-count after summary insertion to ensure context still fits

Env var naming: Inconsistency between old and new variable conventions

Threshold percentage rationale: Document why 80%, 30%, and 20% splits are used for different purposes

Async summarization timeout: No max duration specified for background summarization tasks

Recommendations:

Create a follow-up issue to detail the three helper methods before implementation begins

Add a security/verification step to confirm Anthropic SDK version capability

Add a post-summarization re-validation step to the workflow

Document the threshold percentage choices with constants in code

This is solid architectural work; these gaps are normal for a design phase and should be resolved during implementation planning.

coderabbitai · 2025-11-17T11:37:25Z

+## Token Counting Infrastructure
+
+### Provider-Specific Implementations
+
+| Provider | Method | Accuracy | Latency | Status |
+|----------|--------|----------|---------|--------|
+| OpenAI | js-tiktoken (client) | 100% | ~1-5ms | ✅ Available |
+| Google Gemini | SDK countTokens() | 100% | ~100-150ms | ✅ Available |
+| Anthropic | Count Tokens API | 100% | ~200ms | ❌ Need SDK |
+| AWS Bedrock | CountTokensCommand | 100% | ~150-200ms | ✅ Available |
+| Azure OpenAI | js-tiktoken (client) | 100% | ~1-5ms | ✅ Available |
+| Mistral | tiktoken approx | ~90% | ~1-5ms | ✅ Available |
+| Others | Character estimation | ~60% | <1ms | ✅ Fallback |
+
+### Performance Characteristics
+
+**Token Counting Latency** (per conversation turn):
+
+```typescript
+// Best case: Client-side counting (OpenAI, Azure)
+await countTokens(messages) // ~1-5ms ✅ Acceptable
+
+// Average case: API call with caching (Google, Anthropic)
+await countTokens(messages) // ~100-200ms ⚠️ Noticeable
+// With 5-min cache: ~0.1ms ✅ Negligible
+
+// Worst case: Cache miss + API failure
+await countTokens(messages) // Fallback to estimation ~1ms ✅ Safe
+```
+
+**Summarization Latency**:
+```typescript
+// When threshold exceeded
+await summarize(session) // ~2-5 seconds ⚠️ User waits
+
+// Mitigation: Async summarization (don't block response)
+```
+
+### Architecture
+
+```
+src/lib/services/tokenCounting/
+├── index.ts                    # Main export
+├── types.ts                    # TokenCounter interface
+├── tokenCounterFactory.ts      # Provider selection
+├── counters/
+│   ├── openai.ts              # js-tiktoken (already available)
+│   ├── google.ts              # Google SDK (already available)
+│   ├── anthropic.ts           # Anthropic SDK (need to add)
+│   ├── bedrock.ts             # AWS SDK (already available)
+│   ├── mistral.ts             # tiktoken approximation
+│   ├── estimation.ts          # Fallback (use existing TokenUtils)
+│   └── index.ts
+└── utils/
+    ├── messageNormalizer.ts   # Convert to provider format
+    ├── cache.ts               # 5-min TTL caching
+    └── index.ts
+```
+
+### Dependencies
+
+```json
+{
+  "dependencies": {
+    "@anthropic-ai/sdk": "^0.30.0",  // NEW - for token counting
+    "nanoid": "^5.0.0"                // NEW - for message IDs
+  }
+}
+```
+
+**Already Available:**
+- ✅ `js-tiktoken@1.0.21` (via langsmith)
+- ✅ `@google/generative-ai@0.24.1`
+- ✅ `@aws-sdk/client-bedrock-runtime`
+
+### Token Counter Interface
+
+```typescript
+// src/lib/services/tokenCounting/types.ts
+export interface TokenCounter {
+  countTokens(input: TokenCountInput): Promise<TokenCountResult>;
+  supportsAsync(): boolean;
+  supportedProviders(): AIProviderName[];
+}
+
+export interface TokenCountInput {
+  messages: CoreMessage[];
+  model: string;
+  systemPrompt?: string;
+  tools?: Tool[];
+}
+
+export interface TokenCountResult {
+  inputTokens: number;
+  estimatedOutputTokens?: number;
+  method: 'api' | 'client' | 'estimation';
+  accuracy: 'high' | 'medium' | 'low';
+  cached: boolean;
+  latency: number;  // NEW: Track performance
+}
+```
+
+### Caching Strategy
+
+```typescript
+// Cache key: provider + model + messages hash
+const cacheKey = `${provider}:${model}:${hashMessages(messages)}`;
+
+// TTL: 5 minutes (messages unlikely to change)
+const cachedCount = await cache.get(cacheKey);
+if (cachedCount) return cachedCount;
+
+const startTime = Date.now();
+const count = await countTokens(messages);
+count.latency = Date.now() - startTime;
+
+await cache.set(cacheKey, count, { ttl: 300 });
+```


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Token counting infrastructure is comprehensive, but requires version and implementation verification.

The provider matrix and architecture are well-thought-out. However, several details need validation:

Line 168: @anthropic-ai/sdk ^0.30.0 — verify this version exists and supports token counting API. The current Anthropic SDK may not yet expose this capability.

Line 210: hashMessages() function is referenced but not defined. Specify the hashing algorithm (e.g., SHA-256 on message serialization) to ensure cache key collisions don't occur.

Line 159: messageNormalizer.ts is listed but lacks detail on how provider-specific message format differences are handled (e.g., Anthropic vs. OpenAI message structures).

Lines 212–214: Cache invalidation strategy doesn't address what happens when session messages are modified after a cache entry is created. Is the cache invalidated on any message change?

🌐 Web query:

Does Anthropic SDK @anthropic-ai/sdk ^0.30.0 include a token counting API?

💡 Result:

Short answer: No separate "token counting API" in @anthropic-ai/sdk ^0.30.0 — but the SDK returns token usage per request via the response. See the usage property (input_tokens/output_tokens) on message responses. [1][2]

Sources:

@anthropic-ai/sdk docs / npm package (shows usage output_tokens/input_tokens). [1]

GitHub / changelog for v0.30.0 (SDK features; no dedicated token-counting endpoint). [2]

Anthropic SDK architecture requires significant revision — token counting unavailable as pre-request API.

The web search confirms a critical architectural flaw: @anthropic-ai/sdk ^0.30.0 returns token usage via the response property (input_tokens/output_tokens), not via a standalone pre-request token counting endpoint. This fundamentally differs from your design assumption.

Key issues:

Line 112–113: "Anthropic | Count Tokens API" cannot be implemented as designed. Anthropic only reports token counts after making a request, unlike OpenAI (client-side js-tiktoken) or Google (countTokens method). The architecture must be revised to defer Anthropic token counting or use post-request counts.

Line 210: hashMessages() function is referenced but not defined. Specify the hashing algorithm (e.g., SHA-256) to ensure cache keys are collision-free.

Line 159: messageNormalizer.ts lacks implementation detail on provider-specific message format handling.

Lines 212–214: Cache invalidation strategy is unclear. Define behavior when session messages are modified after cache entry creation.

🤖 Prompt for AI Agents

memory-bank/TOKEN-BASED-MEMORY-IMPLEMENTATION-PLAN.md lines 112–113, 159, 210, 212–214: Anthropic cannot provide a pre-request token-count API so replace the "Count Tokens API" entry with "post-request token reporting" and mark the provider as supported only via response parsing or by estimating tokens pre-request; implement the anthropic counter to read input_tokens/output_tokens from responses and fall back to estimation when no recent response exists. Implement hashMessages() using a deterministic serialization (stable JSON) + SHA-256 and state that the serializer must normalize message order, IDs, and whitespace before hashing. Flesh out messageNormalizer.ts to convert CoreMessage to each provider’s required shape (field mapping, role/name normalization, trimming, and special token handling) with unit tests for OpenAI/Google/Anthropic. Clarify cache invalidation: include messages length and lastEditedTimestamp (or a monotonic session version) in the cache key and state that any edit/new message must bump the session version or invalidate the cache immediately; keep the 5-minute TTL as a safety fallback.

murdore · 2025-11-17T19:12:58Z

+
+### Provider-Specific Implementations
+
+| Provider | Method | Accuracy | Latency | Status |


@sahiltyagiii add for all provders we support. vertex litellm and others

@murdore added bhaiya

murdore · 2025-11-17T19:14:41Z

+
+```typescript
+// src/lib/services/tokenCounting/types.ts
+export interface TokenCounter {


No interface, only types and that too in the central locations where all other types are defined.

okay bhaiya, updated the doc to use types, and add them to centeralized location

murdore · 2025-11-17T19:19:30Z

@@ -0,0 +1,828 @@
+# Token-Based Conversation Memory Implementation Plan


Approach needs to mention how it will be enabled by default and what will be the configuration for default. it should be 80% context memory per model being used.

mentioned default config, also will use getProviderTokenLimit to get the model's token limit and will use a 0.8x multiplier for it
updated the doc accordingly

murdore · 2025-11-17T19:20:10Z

@@ -0,0 +1,828 @@
+# Token-Based Conversation Memory Implementation Plan
+


We need to add future plans of using multi-modality for summarization of the messages.

let's discuss this bhaiya

murdore · 2025-11-17T19:20:58Z

@@ -0,0 +1,828 @@
+# Token-Based Conversation Memory Implementation Plan
+
+**Date:** 2025-11-17


Another requirement is how will the implementation work? How do you make sure LLMs don't overflow the token coin text window?

Another requirement would be that for us to be able to identify 80 percent token window fill and keep summarizing till we are able to answer the query, it can be slow but this can be something that we will have to do in the future.

for this we will check the token count of message, and if it exceeds we can iteratively summarize it or set summarization prompt to strictly keep response in required tokens. If the message is too big for the summarizer LLM we can break it in chunks to summarize to prevent failing in worst cases.

updated the doc for it

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 17, 2025 11:20

Copilot started reviewing on behalf of sahiltyagiii November 17, 2025 11:21 View session

Copilot finished reviewing on behalf of sahiltyagiii November 17, 2025 11:22

Copilot AI reviewed Nov 17, 2025

View reviewed changes

coderabbitai Bot reviewed Nov 17, 2025

View reviewed changes

murdore requested changes Nov 17, 2025

View reviewed changes

sahiltyagiii force-pushed the doc-for-token-based-memory branch from 92b28ee to 24eb611 Compare November 18, 2025 08:15

sahiltyagiii requested a review from murdore November 18, 2025 08:25

doc(memory): add documentation for token based memory and summarization

b926b79

Copilot AI review requested due to automatic review settings November 19, 2025 07:34

sahiltyagiii force-pushed the doc-for-token-based-memory branch from 24eb611 to b926b79 Compare November 19, 2025 07:34

Copilot started reviewing on behalf of sahiltyagiii November 19, 2025 07:35 View session

Copilot finished reviewing on behalf of sahiltyagiii November 19, 2025 07:37

Copilot AI reviewed Nov 19, 2025

View reviewed changes

murdore force-pushed the release branch from 09ff4a5 to 405e3e5 Compare March 12, 2026 10:48

	summarizationModel: process.env.NEUROLINK_SUMMARIZATION_MODEL \|\| "gemini-2.5-flash",
	summarizationModel: process.env.NEUROLINK_SUMMARIZATION_MODEL \|\| "gemini-2.0-flash-001",



	> Note: The default summarization model is set to `gemini-2.5-flash` (not `gemini-2.0-flash` as referenced in some older documentation). This newer model is preferred due to improved performance, accuracy, and support for larger context windows. Ensure your environment supports `gemini-2.5-flash` before deploying.

		@@ -0,0 +1,828 @@
		# Token-Based Conversation Memory Implementation Plan

		Date: 2025-11-17


		### Provider-Specific Implementations

		\| Provider \| Method \| Accuracy \| Latency \| Status \|

Conversation

sahiltyagiii commented Nov 17, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of Change

Changes Made

AI Provider Impact

Component Impact

Performance Impact

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

sahiltyagiii commented Nov 17, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 17, 2025 •

edited

Loading