Skip to content

rady81/llm-context-window

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spring Boot-based GenAI utility service that estimates token usage.

This is exactly how production RAG systems calculate limits before calling LLM.


We created a Spring Boot + Spring AI utility application that helps analyze and manage LLM input in a production-ready way.

The app does two major things:

  1. Estimate tokens and manage context window
  2. Call OpenAI and return real usage metrics

Separation of concerns:

  1. TextCleanService → input normalization
  2. TokenAnalyzeService → token math + context fit logic
  3. ChunkingService → intelligent chunk splitting
  4. ChatUsageService → real LLM invocation + usage extraction
  5. ModelContextRegistry → model metadata configuration

This project shows you understand: LLM tokenization fundamentals Context window limitations Prompt vs completion token accounting Chunking strategy design Output token reservation logic Clean service-layer separation External API integration via Spring AI Metadata extraction from AI responses Environment-based secret management Production-ready structure

Swagger UI

After running the app:

OpenAI Key (Required only for /usage endpoint)

Do NOT put keys in Git. Set it as an env var.

Windows PowerShell:

setx OPENAI_API_KEY "YOUR_KEY"

Endpoints:

/api/context/analyze -> works without OpenAI key
/api/context/usage -> requires OpenAI key
*************************************

Small Spring Boot app to:
- clean input text
- estimate token count (Spring AI TokenCountEstimator / JTokkit)
- check against a model context window
- chunk long text to fit a token budget

## Requirements
- Java 21
- Maven
- OpenAI API key in env var (bit $.env.OPENAI_API_KEY)

## Setup

### 1) Set API key (recommended)
Windows PowerShell:
```powershell
setx OPENAI_API_KEY "YOUR_KEY"

### 2) Build and run
```bash
mvn clean package
java -jar target/spring-ai-context-lab-0.0.1-SNAPSHOT.jar

Endpoints

List known model limits: GET http://localhost:8080/api/context/models

Analyze: POST http://localhost:8080/api/context/analyze

Body:

{
  "text": "Your input text here..."
}
{
  "text": "your text",
  "model": "gpt-4o",
  "reserveForOutput": 2000,
  "chunkOverlapTokens": 200
}

****************************


---

## Small upgrade you can add next
If you want “real token usage” from an actual OpenAI call, we can add one endpoint that sends a prompt and returns the **usage tokens** from the API response (Spring AI exposes token usage metadata in responses). That’s useful to compare “estimated tokens” vs “actual usage”.


- `/api/context/chat` (calls OpenAI)
- returns prompt tokens, output tokens, total tokens, plus the completion text.



----

## Notes

Model context window values come from OpenAI model docs and can change over time.
Update ModelContextRegistry.java to add/remove models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages