Thanks for your interesting package.
Do you think Clustergram could work with top2vec ?
https://github.com/ddangelov/Top2Vec
I saw that there is the option to create a clustergram from a DataFrame.
In top2vec, each "document" to cluster is represented as a embedding of a certain dimension, 256 , for example.
So I could indeed generate a data frame, like this:
| x0 |
x1 |
... |
x255 |
topic |
| 0.5 |
0.2 |
.... |
-0.2 |
2 |
| 0.7 |
0.2 |
.... |
-0.1 |
2 |
| 0.5 |
0.2 |
.... |
-0.2 |
3 |
Does Clustergram assume anything on the rows of this data frame ?
I saw that the from_data method either takes "mean" or "medium" as method to calculate the cluster centers.
In word vector, we use typically the cosine distance to calculate distances between the vectors. Does this have any influence ?
top2vec calculates as well the "topic vectors" as a mean of the "document vectors", I believe.
Thanks for your interesting package.
Do you think Clustergram could work with top2vec ?
https://github.com/ddangelov/Top2Vec
I saw that there is the option to create a clustergram from a DataFrame.
In top2vec, each "document" to cluster is represented as a embedding of a certain dimension, 256 , for example.
So I could indeed generate a data frame, like this:
Does Clustergram assume anything on the rows of this data frame ?
I saw that the from_data method either takes "mean" or "medium" as method to calculate the cluster centers.
In word vector, we use typically the cosine distance to calculate distances between the vectors. Does this have any influence ?
top2vec calculates as well the "topic vectors" as a mean of the "document vectors", I believe.