We created a Spring Boot + Spring AI utility application that helps analyze and manage LLM input in a production-ready way.
The app does two major things:
- Estimate tokens and manage context window
- Call OpenAI and return real usage metrics
Separation of concerns:
- TextCleanService → input normalization
- TokenAnalyzeService → token math + context fit logic
- ChunkingService → intelligent chunk splitting
- ChatUsageService → real LLM invocation + usage extraction
- ModelContextRegistry → model metadata configuration
This project shows you understand: LLM tokenization fundamentals Context window limitations Prompt vs completion token accounting Chunking strategy design Output token reservation logic Clean service-layer separation External API integration via Spring AI Metadata extraction from AI responses Environment-based secret management Production-ready structure
After running the app:
- Swagger UI: http://localhost:8080/swagger
Do NOT put keys in Git. Set it as an env var.
Windows PowerShell:
setx OPENAI_API_KEY "YOUR_KEY"
Endpoints:
/api/context/analyze -> works without OpenAI key
/api/context/usage -> requires OpenAI key
*************************************
Small Spring Boot app to:
- clean input text
- estimate token count (Spring AI TokenCountEstimator / JTokkit)
- check against a model context window
- chunk long text to fit a token budget
## Requirements
- Java 21
- Maven
- OpenAI API key in env var (bit $.env.OPENAI_API_KEY)
## Setup
### 1) Set API key (recommended)
Windows PowerShell:
```powershell
setx OPENAI_API_KEY "YOUR_KEY"
### 2) Build and run
```bash
mvn clean package
java -jar target/spring-ai-context-lab-0.0.1-SNAPSHOT.jarList known model limits: GET http://localhost:8080/api/context/models
Analyze: POST http://localhost:8080/api/context/analyze
Body:
{
"text": "Your input text here..."
}
{
"text": "your text",
"model": "gpt-4o",
"reserveForOutput": 2000,
"chunkOverlapTokens": 200
}
****************************
---
## Small upgrade you can add next
If you want “real token usage” from an actual OpenAI call, we can add one endpoint that sends a prompt and returns the **usage tokens** from the API response (Spring AI exposes token usage metadata in responses). That’s useful to compare “estimated tokens” vs “actual usage”.
- `/api/context/chat` (calls OpenAI)
- returns prompt tokens, output tokens, total tokens, plus the completion text.
----
## Notes
Model context window values come from OpenAI model docs and can change over time.
Update ModelContextRegistry.java to add/remove models.