Skip to content

Make the posterior() stats available  #79

Description

@aourednik

The intially wrapped package topicmodels offered the possibility of more refined exploration of topics in every document with topicmodels::posterior(my_lda)$topics. Could this be made available for a result of seededlda::textmodel_lda() ?

Given the probabilistic nature of topic-document associations, it would be nice to sensibilize students and the public to the fact that a given topic is only the most present one in a given text, not the only one.

Example:

lda_model2 <- topicmodels::LDA(convert(my_dfm, to = "topicmodels"), k = 6)
doc_topics <- topicmodels::posterior(lda_model2)$topics
df <- data.frame(doc_id = row.names(doc_topics) %>% str_replace(fixed(".txt"),""), doc_topics)
df_long <- tidyr::pivot_longer(df, cols = starts_with("X"), names_to = "topic", values_to = "importance")
ggplot(df_long, aes(x = importance, y = doc_id, fill = factor(topic))) +
	geom_bar(stat = "identity") +
	labs(x = "Topic Importance", y = "Document ID", fill = "Topic") +
	theme_minimal() +
	theme(axis.text.y = element_text(angle = 0, hjust = 1))

mytextsplot2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions