[QST] Categorifying nested lists in NVTabular and transformers4rec

# ❓ Questions & Help

## Details

Hello everyone! In my sequential recommendation dataset every item actually comes annotated with a list of categories (potentially with repeated values). The following would be a pretty meaningful example.
```python
data = [
    {"session_id": 1, "item_id-list": [101, 102, 103], "categories-list": [[A, B], [C, D], [E]]},
    {"session_id": 2, "item_id-list": [201, 202], "categories-list": [[A], [F, F]]}
]
```
Is it possible to categorify the categories present above in a nested way so that:
- the lists `[[A,B], [C,D], ..], ..` do not become separate tokens but remain lists of categorified elements (e.g. `[[1,2], [3,4], [6]]` and `[[1], [5,5]]`)
- we can then feed those into `EmbeddingBag` downstream?

I've tried supplying the Dataset constructor with an appropriate schema, but unfortunately failed. I could also try flattening the lists categorifying and fusing back but this looks like a inefficient and bad idea..



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

❓ Questions & Help

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

Description

❓ Questions & Help

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions