Describe the Bug
When bookmarking a single tweet from X (Twitter) with cookies configured, the crawler's content extraction picks up reply/comment content from below the original tweet, mixing unrelated text into the reader view and the indexed content.
For example, bookmarking a tweet by @rusino777 saying something about chord progression, the reader view includes replies from another user posting a meme image and tons of emoji images.
These are other users' replies, not the original tweet content. This pollutes:
Reader view — shows irrelevant text
Full-text search index — replies become searchable as if they were the bookmarked content
AI tagging / summarization — tags and summaries are generated from reply content rather than the original tweet
Steps to Reproduce
- Configure browser cookies for x.com / twitter.com in Karakeep settings
- Bookmark any popular tweet URL (one with visible replies), e.g. https://x.com/rusino777/status/2015243512728707399
- Wait for the crawling job to complete
- Open the bookmark and switch to Reader View
- Observe that reply content from other users appears below the original tweet text
Expected Behaviour
The extracted content (reader view / indexed text) should contain only the original tweet's text and media, not replies from other users (Except user specify it explicitly, like #1468).
Screenshots or Additional Context

A Tweet by ルシノ, talking about chord progression

Reader view, showing the original tweet text followed by unrelated reply content and emoji image
Device Details
No response
Exact Karakeep Version
v0.31.0
Environment Details
Docker on Mac Mini 2, 26.0.1
Debug Logs
No response
Have you checked the troubleshooting guide?
Describe the Bug
When bookmarking a single tweet from X (Twitter) with cookies configured, the crawler's content extraction picks up reply/comment content from below the original tweet, mixing unrelated text into the reader view and the indexed content.
For example, bookmarking a tweet by @rusino777 saying something about chord progression, the reader view includes replies from another user posting a meme image and tons of emoji images.
These are other users' replies, not the original tweet content. This pollutes:
Reader view — shows irrelevant text
Full-text search index — replies become searchable as if they were the bookmarked content
AI tagging / summarization — tags and summaries are generated from reply content rather than the original tweet
Steps to Reproduce
Expected Behaviour
The extracted content (reader view / indexed text) should contain only the original tweet's text and media, not replies from other users (Except user specify it explicitly, like #1468).
Screenshots or Additional Context
Device Details
No response
Exact Karakeep Version
v0.31.0
Environment Details
Docker on Mac Mini 2, 26.0.1
Debug Logs
No response
Have you checked the troubleshooting guide?