Skip to content

list of improvements #20

Description

@jwijffels
  • allow to pass a list of integers instead of tokens to the word2vec function
  • see how to remove the embedding of </s>
  • abandon file-based approach
  • speed up for Xptr's like quanteda objects to avoid copying data?
  • other speed improvements
  • progress bar
  • functionalities for downstream processing
    - plotting or functionalities in https://github.com/bnosac/textplot
    - downstream topic modelling like https://github.com/bnosac/ETM or as a replacement of SVD's for semi-supervised stuff
    - embeddings on sentencepiece/tokenisers.bpe tokenised data
    - pretrained models
    - further input to torch models
    - deeper integration of the similarities like https://github.com/bnosac/doc2vec or https://koheiw.github.io/LSX

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions