ChatGPT Wrapped: DS grade

Analyze your ChatGPT history locally with industrial-grade LLM metadata extraction and generate a feature rich, interactive dashboard. The pipeline leaves you with a rich metadata layer for every conversation, ready for your own experiments or search engines.

And more!

🚀 Quick Start

Note: This implementation currently supports OpenRouter only for metadata extraction.

Note: The pipeline handles all your history since launch (May 2023), automatically generating multi-year heatmaps and timelines.

Clone the Repository:

git clone https://github.com/otonashi-labs/chatgpt-wrapped.git
cd chatgpt-wrapped

Export Data: Go to ChatGPT Settings → Data Controls → Export Data. You'll receive an email with a zip file. Locate conversations.json inside and place it into data/conversations/.
Configure AI: Copy env.example to .env and add your OpenRouter API Key.

Install Dependencies:

# Install Python tools
pip install -r unroller/requirements.txt -r metadater/requirements.txt

# Install Dashboard generator (requires Bun)
cd wrapped && bun install && cd ..

Run Pipeline:
```
python run.py --concurrency 10
```
(Concurrency of 10 processes towards LLM calls ~100 chats in 1-2 minutes. Feel free to increase if your rate limits allow.)
View Dashboard:
- Open wrapped/wrapped.html directly in your browser.
- Or run a local dev server for live viewing:
```
cd wrapped && bun run dev
```
  Then open http://localhost:9876.

An obfuscated example dashboard is included in the repository. Note that GitHub does not render HTML files directly; for the full interactive experience, it is recommended to view it locally.

🤖 AI Coding Agent? Check out AI_README.md for a technical guide on how to navigate and customize this repository.

🫦 Motivation (hooman written)

So it's always been a struggle to find something in ChatGPT chats.

Imagine you need a formula from research you have done months ago. Or banger GTM idea you have written to chat at 2 am random Thursday. You know that it is there, but oh man it takes time and grind to find it. Especially if you have thousands of chats. That is why an idea of building a good search over the chats has been around with me; you know - proper SOTA agentic search.

For a good search you need to build the metadata layer over chats. I've decided to do it two fold:

deterministic -unroll/ module
LLM infused - metadater/prompt.md & Gemini 3 Flash

Once the metadata has been obtained - I've realized that it's a "Wrapped season" going right now. So here it goes - nice side quest.

Maybe in some near future - full agentic search thingy will be released here as well. I am currently tinkering on it. In the direction of a proper "Second Brain".

If you’re into personal knowledge tooling / retrieval / evaluation / agentic search: I’d love issues, PRs, and wild ideas.

ALSO: I will be very grateful for the feedback on metadata and indexing. How to make it better? How to make the important conversations to "surface" even more?

🏗️ What's under the hood

The pipeline is designed to handle thousands of conversations with high precision.

1. Unroll (`unroller/`)

Splits your monolithic conversations.json (often hundreds of MBs) into manageable, monthly-organized files. It also performs initial enrichment:

Command: python unroller/unroll.py data/conversations/conversations.json

Deterministic Metadata:

{
  "total_messages": 12,
  "messages_by_role": {"user": 5, "assistant": 5, "system": 2},
  "total_tokens": 2500, // Estimated via char count
  "user_tokens": 800,
  "assistant_tokens": 1700,
  "models_used": ["gpt-4o"],
  "primary_model": "gpt-4o",
  "duration_seconds": 120.5,
  "duration_human": "2m 0s",
  "word_count": 450,
  "image_count": 0,
  "audio_count": 0,
  "is_voice_conversation": false
}

2. Infuse Metadata (`metadater/`)

The "brain" of the project. It uses Gemini 3 Flash to analyze every conversation against a custom 10-domain taxonomy defined in metadater/config.py. Each conversation is enriched with metadata according to the instructions in metadater/prompt.md:

Classification: Domain, sub-domain, conversation type, and request types.
Context: User intent, specific keywords, and entity extraction.
Quality Metrics: 8+ numerical scores measuring engagement and response quality.
Dynamics: Tone, mood, and flow patterns.

For a full explanation of the extraction logic and available fields, see metadater/prompt.md and the taxonomy in metadater/config.py.

Note: Improving this metadata layer is a hot area for future work. I am actively looking for ways to make indexing better and to help important conversations surface more effectively. Feedback and wild ideas are very welcome!

Example LLM Metadata (llm_meta):

{
  "domain": "problem_solving",
  "sub_domain": "debugging",
  "conversation_type": "troubleshooting",
  "user_intent": "Fixing a race condition in a Python script using asyncio and threading locks",
  "request_types": ["task", "explanation"],
  "keywords": ["race condition", "threading", "lock", "asyncio", "deadlock"],
  "entities_people": [],
  "entities_companies": ["OpenAI", "GitHub"],
  "entities_products": ["Visual Studio Code"],
  "entities_places": [],
  "technologies": ["Python", "httpx", "asyncio"],
  "concepts": ["Concurrency Control", "Mutual Exclusion"],
  "inferred_future_relevance_score": 85,
  "urgency_score": 40,
  "complexity_score": 70,
  "information_density": 90,
  "depth_of_engagement": 75,
  "user_satisfaction_inferred": 95,
  "user_request_quality_inferred": 80,
  "ai_response_quality_score": 90,
  "serendipity_vs_general_public": 75,
  "serendipity_vs_power_users": 65,
  "conversation_flow": "iterative",
  "user_mood": "focused",
  "conversation_tone": "technical",
  "one_line_summary": "Debugging Python asyncio race condition with threading locks",
  "outcome_type": "task_completed",
  "information_direction": "collaborative",
  "topic_tags": ["python_concurrency", "debugging_session"]
}

3. Generate Wrapped (`wrapped/`)

Aggregates all metadata into a unified statistics engine and produces a feature-rich dashboard. There are two ways to use it:

Static Mode: Generates a standalone, interactive wrapped.html file that you can open anywhere.
```
python wrapped/aggregate.py
cd wrapped && bun run generate
```
Live Mode: Runs a local development server for a more dynamic experience.
```
cd wrapped && bun run dev
```

Performance & Cost

Gemini 3 Flash: Chosen for its massive 1M token context window and low cost.
Concurrency: Optimized for speed with parallel async requests. A concurrency of 10 can process approximately 100 conversations every 1-2 minutes.
Cost Estimate: Processing ~1,500 conversations typically costs between $5-7 USD via OpenRouter.

🛡️ Privacy First

Local Processing: Your raw data never leaves your machine except for the metadata extraction request sent to the LLM.
No Tracking: This tool has no analytics or external reporting.
Protected: The .gitignore is pre-configured to ensure no JSON exports or .env files are ever committed.

📝 Notes & Discussion

Gemini 3 Flash seems to treat GPT-4o with slight arrogance. This is seen by weirdly lower costs for 4o. Or it's just the LLM models progress.
More to come...

📄 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatGPT Wrapped: DS grade

🚀 Quick Start

🫦 Motivation (hooman written)

🏗️ What's under the hood

1. Unroll (`unroller/`)

2. Infuse Metadata (`metadater/`)

3. Generate Wrapped (`wrapped/`)

Performance & Cost

🛡️ Privacy First

📝 Notes & Discussion

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
images		images
metadater		metadater
unroller		unroller
wrapped		wrapped
.gitignore		.gitignore
AI_README.md		AI_README.md
README.md		README.md
env.example		env.example
run.py		run.py
wrapped_example.html		wrapped_example.html

Folders and files

Latest commit

History

Repository files navigation

ChatGPT Wrapped: DS grade

🚀 Quick Start

🫦 Motivation (hooman written)

🏗️ What's under the hood

1. Unroll (unroller/)

2. Infuse Metadata (metadater/)

3. Generate Wrapped (wrapped/)

Performance & Cost

🛡️ Privacy First

📝 Notes & Discussion

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Unroll (`unroller/`)

2. Infuse Metadata (`metadater/`)

3. Generate Wrapped (`wrapped/`)

Packages