- [ ] allow to pass a list of integers instead of tokens to the word2vec function - [ ] see how to remove the embedding of `</s> ` - [ ] abandon file-based approach - [ ] speed up for Xptr's like quanteda objects to avoid copying data? - [ ] other speed improvements - [ ] progress bar - [ ] functionalities for downstream processing - plotting or functionalities in https://github.com/bnosac/textplot - downstream topic modelling like https://github.com/bnosac/ETM or as a replacement of SVD's for semi-supervised stuff - embeddings on sentencepiece/tokenisers.bpe tokenised data - pretrained models - further input to torch models - deeper integration of the similarities like https://github.com/bnosac/doc2vec or https://koheiw.github.io/LSX
</s>- plotting or functionalities in https://github.com/bnosac/textplot
- downstream topic modelling like https://github.com/bnosac/ETM or as a replacement of SVD's for semi-supervised stuff
- embeddings on sentencepiece/tokenisers.bpe tokenised data
- pretrained models
- further input to torch models
- deeper integration of the similarities like https://github.com/bnosac/doc2vec or https://koheiw.github.io/LSX