fix: InMemoryDocumentStore.load_from_disk corrupts documents with blob or sparse_embedding (loaded as raw dicts)

**Describe the bug**
`InMemoryDocumentStore.load_from_disk` reconstructs documents with the plain `Document` constructor instead of `Document.from_dict`:

```python
cls_object.write_documents(
    documents=[Document(**doc) for doc in documents], policy=DuplicatePolicy.OVERWRITE
)
```

`save_to_disk` serializes documents with `Document.to_dict(flatten=False)`, which converts nested objects to plain dicts (`blob` → `ByteStream.to_dict()`, `sparse_embedding` → `SparseEmbedding.to_dict()`). `Document.from_dict` is the inverse and restores the proper types — but the constructor performs no such conversion. So any document saved with a `blob` or a `sparse_embedding` is loaded back with those fields as **raw dicts**.

The corrupted documents then break in cascade:

```text
blob type after load: dict (expected ByteStream)
sparse type after load: dict (expected SparseEmbedding)
repr(doc)     -> AttributeError: 'dict' object has no attribute 'data'
doc.to_dict() -> AttributeError: 'dict' object has no attribute 'to_dict'
doc == other  -> AttributeError: 'dict' object has no attribute 'to_dict'
store.save_to_disk(...) on the reloaded store -> AttributeError: 'dict' object has no attribute 'to_dict'
```

Anything that touches `document.blob.data` downstream (e.g. image pipelines via `DocumentToImageContent`) crashes too, and a save → load → save cycle is impossible.

**Error message**
`AttributeError: 'dict' object has no attribute 'data'` / `AttributeError: 'dict' object has no attribute 'to_dict'`

**Expected behavior**
`load_from_disk` restores documents exactly as they were saved: `blob` as `ByteStream`, `sparse_embedding` as `SparseEmbedding`; the loaded documents compare equal to the originals and the store can be saved again.

**To Reproduce**
```python
import tempfile
from pathlib import Path

from haystack.dataclasses import ByteStream, Document, SparseEmbedding
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc = Document(
    content="image doc",
    blob=ByteStream(data=b"\x89PNG fake image bytes", mime_type="image/png"),
    sparse_embedding=SparseEmbedding(indices=[0, 5], values=[0.1, 0.9]),
)

store = InMemoryDocumentStore()
store.write_documents([doc])

path = str(Path(tempfile.gettempdir()) / "store.json")
store.save_to_disk(path)
loaded = InMemoryDocumentStore.load_from_disk(path)
ldoc = loaded.filter_documents()[0]

print(type(ldoc.blob))              # <class 'dict'>
repr(ldoc)                          # AttributeError
```

**FAQ Check**
- [x] Have you had a look at [our new FAQ page](https://docs.haystack.deepset.ai/docs/faq)?

**System:**
- OS: Windows 11
- Haystack version: main (2.x)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: InMemoryDocumentStore.load_from_disk corrupts documents with blob or sparse_embedding (loaded as raw dicts) #11593

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix: InMemoryDocumentStore.load_from_disk corrupts documents with blob or sparse_embedding (loaded as raw dicts) #11593

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions