Spring Boot-based GenAI utility service that estimates token usage.

This is exactly how production RAG systems calculate limits before calling LLM.

We created a Spring Boot + Spring AI utility application that helps analyze and manage LLM input in a production-ready way.

The app does two major things:

Estimate tokens and manage context window
Call OpenAI and return real usage metrics

Separation of concerns:

TextCleanService → input normalization
TokenAnalyzeService → token math + context fit logic
ChunkingService → intelligent chunk splitting
ChatUsageService → real LLM invocation + usage extraction
ModelContextRegistry → model metadata configuration

This project shows you understand: LLM tokenization fundamentals Context window limitations Prompt vs completion token accounting Chunking strategy design Output token reservation logic Clean service-layer separation External API integration via Spring AI Metadata extraction from AI responses Environment-based secret management Production-ready structure

Swagger UI

After running the app:

Swagger UI: http://localhost:8080/swagger

OpenAI Key (Required only for /usage endpoint)

Do NOT put keys in Git. Set it as an env var.

Windows PowerShell:

setx OPENAI_API_KEY "YOUR_KEY"

Endpoints:

/api/context/analyze -> works without OpenAI key
/api/context/usage -> requires OpenAI key
*************************************

Small Spring Boot app to:
- clean input text
- estimate token count (Spring AI TokenCountEstimator / JTokkit)
- check against a model context window
- chunk long text to fit a token budget

## Requirements
- Java 21
- Maven
- OpenAI API key in env var (bit $.env.OPENAI_API_KEY)

## Setup

### 1) Set API key (recommended)
Windows PowerShell:
```powershell
setx OPENAI_API_KEY "YOUR_KEY"

### 2) Build and run
```bash
mvn clean package
java -jar target/spring-ai-context-lab-0.0.1-SNAPSHOT.jar

Endpoints

List known model limits: GET http://localhost:8080/api/context/models

Analyze: POST http://localhost:8080/api/context/analyze

Body:

{
  "text": "Your input text here..."
}
{
  "text": "your text",
  "model": "gpt-4o",
  "reserveForOutput": 2000,
  "chunkOverlapTokens": 200
}

****************************


---

## Small upgrade you can add next
If you want “real token usage” from an actual OpenAI call, we can add one endpoint that sends a prompt and returns the **usage tokens** from the API response (Spring AI exposes token usage metadata in responses). That’s useful to compare “estimated tokens” vs “actual usage”.


- `/api/context/chat` (calls OpenAI)
- returns prompt tokens, output tokens, total tokens, plus the completion text.



----

## Notes

Model context window values come from OpenAI model docs and can change over time.
Update ModelContextRegistry.java to add/remove models.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.mvn/wrapper		.mvn/wrapper
screenshots		screenshots
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
HELP.md		HELP.md
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spring Boot-based GenAI utility service that estimates token usage.

This is exactly how production RAG systems calculate limits before calling LLM.

Swagger UI

OpenAI Key (Required only for /usage endpoint)

Endpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spring Boot-based GenAI utility service that estimates token usage.

This is exactly how production RAG systems calculate limits before calling LLM.

Swagger UI

OpenAI Key (Required only for /usage endpoint)

Endpoints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages