Skip to content

Twitter/X: Crawler extracts unrelated reply content into reader view #2651

@NachtgeistW

Description

@NachtgeistW

Describe the Bug

When bookmarking a single tweet from X (Twitter) with cookies configured, the crawler's content extraction picks up reply/comment content from below the original tweet, mixing unrelated text into the reader view and the indexed content.
For example, bookmarking a tweet by @rusino777 saying something about chord progression, the reader view includes replies from another user posting a meme image and tons of emoji images.
These are other users' replies, not the original tweet content. This pollutes:
Reader view — shows irrelevant text
Full-text search index — replies become searchable as if they were the bookmarked content
AI tagging / summarization — tags and summaries are generated from reply content rather than the original tweet

Steps to Reproduce

  1. Configure browser cookies for x.com / twitter.com in Karakeep settings
  2. Bookmark any popular tweet URL (one with visible replies), e.g. https://x.com/rusino777/status/2015243512728707399
  3. Wait for the crawling job to complete
  4. Open the bookmark and switch to Reader View
  5. Observe that reply content from other users appears below the original tweet text

Expected Behaviour

The extracted content (reader view / indexed text) should contain only the original tweet's text and media, not replies from other users (Except user specify it explicitly, like #1468).

Screenshots or Additional Context

Image A Tweet by ルシノ, talking about chord progression Image Image Reader view, showing the original tweet text followed by unrelated reply content and emoji image

Device Details

No response

Exact Karakeep Version

v0.31.0

Environment Details

Docker on Mac Mini 2, 26.0.1

Debug Logs

No response

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstatus/untriagedThis issue needs triaging to confirm it

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions