Skip to content
This repository was archived by the owner on Mar 23, 2021. It is now read-only.
This repository was archived by the owner on Mar 23, 2021. It is now read-only.

Handling of mbox files #47

@grenwi

Description

@grenwi

How are mbox files handled by snoop?

In my collection, I have a number of Thunderbird email folders. Thunderbird stores an entire folder in one large mbox file. Apparently, snoop (or tika?) detects it as a generic text file format and extracts all data from it without any processing of the single email files it contains.

This is not only bad because single mails are not extracted, but also because indexing of these large text files (>> 1GB) causes elasticsearch to crash (see this issue: liquidinvestigations/hoover-search#31).

I am not sure how to handle this. Currently, I think of converting Thunderbird mbox files to pst or eml before giving them to hoover. Or is there any opportunity to detect and handle mbox files? What would you suggest?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions