Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

Filter by
Sorted by
Tagged with
44 votes
6 answers
33k views

Remove empty documents from DocumentTermMatrix in R topicmodels?

I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix: corpus <- Corpus(VectorSource(...
Bill M's user avatar
  • 711
44 votes
2 answers
27k views

LDA topic modeling - Training and testing

I have read LDA and I understand the mathematics of how the topics are generated when one inputs a collection of documents. References say that LDA is an algorithm which, given a collection of ...
tan's user avatar
  • 1,579
32 votes
2 answers
4k views

Simple Python implementation of collaborative topic modeling?

I came across these 2 papers which combined collaborative filtering (Matrix factorization) and Topic modelling (LDA) to recommend users similar articles/posts based on topic terms of post/articles ...
jxn's user avatar
  • 7,883
29 votes
5 answers
31k views

Understanding LDA implementation using gensim

I am trying to understand how gensim package in Python implements Latent Dirichlet Allocation. I am doing the following: Define the dataset documents = ["Apple is releasing a new product", ...
visakh's user avatar
  • 2,513
29 votes
2 answers
34k views

Topic models: cross validation with loglikelihood or perplexity

I'm clustering documents using topic modeling. I need to come up with the optimal topic numbers. So, I decided to do ten fold cross validation with topics 10, 20, ...60. I have divided my corpus into ...
user37874's user avatar
  • 415
26 votes
10 answers
48k views

How to print the LDA topics models from gensim? Python

Using gensim I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models? When printing the lda.print_topics(10) the code gave the ...
alvas's user avatar
  • 119k
26 votes
2 answers
46k views

Gensim: KeyError: "word not in vocabulary"

I have a trained Word2vec model using Python's Gensim Library. I have a tokenized list as below. The vocab size is 34 but I am just giving few out of 34: b = ['let', 'know', 'buy', 'someth', '...
Krishnang K Dalal's user avatar
26 votes
2 answers
15k views

What's the disadvantage of LDA for short texts?

I am trying to understand why Latent Dirichlet Allocation(LDA) performs poorly in short text environments like Twitter. I've read the paper 'A biterm topic model for short text', however, I still do ...
Shuguang Zhu's user avatar
24 votes
1 answer
21k views

Export pyLDAvis graphs as standalone webpage

i am analysing text with topic modelling and using Gensim and pyLDAvis for that. Would like to share the results with distant colleagues, without a need for them to install python and all required ...
Darius's user avatar
  • 596
21 votes
1 answer
18k views

Predicting LDA topics for new data

It looks like this question has may have been asked a few times before (here and here), but it has yet to be answered. I'm hoping this is due to the previous ambiguity of the question(s) asked, as ...
David's user avatar
  • 9,335
21 votes
6 answers
10k views

Using scikit-learn vectorizers and vocabularies with gensim

I am trying to recycle scikit-learn vectorizer objects with gensim topic models. The reasons are simple: first of all, I already have a great deal of vectorized data; second, I prefer the interface ...
emiguevara's user avatar
  • 1,369
21 votes
3 answers
23k views

Using Word2Vec for topic modeling

I have read that the most common technique for topic modeling (extracting possible topics from text) is Latent Dirichlet allocation (LDA). However, I am interested whether it is a good idea to try ...
user1814735's user avatar
19 votes
4 answers
19k views

LDA model generates different topics everytime i train on the same corpus

I am using python gensim to train an Latent Dirichlet Allocation (LDA) model from a small corpus of 231 sentences. However, each time i repeat the process, it generates different topics. Why does ...
alvas's user avatar
  • 119k
19 votes
3 answers
27k views

LDA with topicmodels, how can I see which topics different documents belong to?

I am using LDA from the topicmodels package, and I have run it on about 30.000 documents, acquired 30 topics, and got the top 10 words for the topics, they look very good. But I would like to see ...
d12n's user avatar
  • 841
17 votes
2 answers
26k views

get_document_topics and get_term_topics in gensim

The ldamodel in gensim has the two methods: get_document_topics and get_term_topics. Despite their use in this gensim tutorial notebook, I do not fully understand how to interpret the output of ...
tkja's user avatar
  • 1,980
15 votes
1 answer
7k views

How to interpret LDA components (using sklearn)?

I used Latent Dirichlet Allocation (sklearn implementation) to analyse about 500 scientific article-abstracts and I got topics containing most important words (in german language). My problem is to ...
LSz's user avatar
  • 161
14 votes
3 answers
34k views

Evaluation of topic modeling: How to understand a coherence value / c_v of 0.4, is it good or bad? [closed]

I need to know whether coherence score of 0.4 is good or bad? I use LDA as topic modelling algorithm. What is the average coherence score in this context?
User Mohamed's user avatar
14 votes
1 answer
5k views

Spark MLlib LDA, how to infer the topics distribution of a new unseen document?

i am interested in applying LDA topic modelling using Spark MLlib. I have checked the code and the explanations in here but I couldn't find how to use the model then to find the topic distribution in ...
Rami's user avatar
  • 8,204
14 votes
1 answer
4k views

R Supervised Latent Dirichlet Allocation Package

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it ...
Alex R.'s user avatar
  • 1,427
12 votes
2 answers
13k views

__init__() got an unexpected keyword argument 'cachedir' when importing top2vec

I keep getting this error when importing top2vec. TypeError Traceback (most recent call last) Cell In [1], line 1 ----> 1 from top2vec import Top2Vec File ~\AppData\...
Redwan Hossain Arnob's user avatar
12 votes
2 answers
17k views

What is the best way to obtain the optimal number of topics for a LDA-Model using Gensim?

I am trying to obtain the optimal number of topics for an LDA-model within Gensim. One method I found is to calculate the log likelihood for each model and compare each against each other, e.g. at The ...
Akantor's user avatar
  • 151
12 votes
2 answers
3k views

Gensim LDA topic assignment

I am hoping to assign each document to one topic using LDA. Now I realise that what you get is a distribution over topics from LDA. However as you see from the last line below I assign it to the most ...
sachinruk's user avatar
  • 9,903
11 votes
1 answer
9k views

Understanding LDA / topic modelling -- too much topic overlap

I'm new to topic modelling / Latent Dirichlet Allocation and have trouble understanding how I can apply the concept to my dataset (or whether it's the correct approach). I have a small number of ...
zinfandel's user avatar
  • 428
11 votes
2 answers
12k views

Making gsub only replace entire words?

(I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a ...
user2303557's user avatar
11 votes
5 answers
15k views

Visualizing an LDA model, using Python

I have a LDA model with the 10 most common topics in 10K documents. Now it's just an overview of the words with corresponding probability distribution for each topic. I was wondering if there is ...
mvh's user avatar
  • 189
11 votes
3 answers
17k views

How to predict the topic of a new query using a trained LDA model using gensim?

I have trained a corpus for LDA topic modelling using gensim. Going through the tutorial on the gensim website (this is not the whole code): question = 'Changelog generation from Github issues?'; ...
Animesh Pandey's user avatar
10 votes
6 answers
9k views

How to access topic words only in gensim

I built LDA model using Gensim and I want to get the topic words only How can I get the words of the topics only no probabilities and no IDs.words only I tried print_topics() and show_topics() ...
Muhammed Eltabakh's user avatar
10 votes
2 answers
4k views

What is the relation between topic modeling and document clustering?

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do ...
afs's user avatar
  • 167
10 votes
1 answer
7k views

How to get document_topics distribution of all of the document in gensim LDA?

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code: dictionary = Dictionary(docs) corpus = [dictionary.doc2bow(doc) for doc in docs] from ...
wayne64001's user avatar
10 votes
3 answers
7k views

How to understand the output of Topic Model class in Mallet?

As I'm trying out the examples code on topic modeling developer's guide, I really want to understand the meaning of the output of that code. First during the running process, it gives out: Coded LDA:...
Matt's user avatar
  • 741
10 votes
1 answer
10k views

LDA Topic Model Performance - Topic Coherence Implementation for scikit-learn

I have a question around measuring/calculating topic coherence for LDA models built in scikit-learn. Topic Coherence is a useful metric for measuring the human interpretability of a given LDA topic ...
learning-new-things-guy's user avatar
9 votes
2 answers
7k views

How to get all documents per topic in bertopic modeling

I have a dataset and trying to convert it to topics using berTopic modeling but the problem is, i cant get all the docoments of a topic. berTopic is only return 3 docoments per topic. topic_model = ...
Kaleem's user avatar
  • 91
9 votes
1 answer
6k views

Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

I am now going through LDA(Latent Dirichlet Allocation) Topic modelling method to help in extraction of topics from a set of documents. As from what I have understood from the link below, this is an ...
Bala's user avatar
  • 193
9 votes
4 answers
8k views

pyLDAvis: Validation error on trying to visualize topics

I tried generating topics using gensim for 300000 records. On trying to visualize the topics, I get a validation error. I can print the topics after model training, but it fails on using pyLDAvis # ...
Hackerds's user avatar
  • 1,195
9 votes
2 answers
12k views

How do I print lda topic model and the word cloud of each of the topics

from nltk.tokenize import RegexpTokenizer from stop_words import get_stop_words from gensim import corpora, models import gensim import os from os import path from time import sleep import matplotlib....
Raj's user avatar
  • 181
8 votes
2 answers
7k views

python scikit learn, get documents per topic in LDA

I am doing an LDA on a text data, using the example here: My question is: How can I know which documents correspond to which topic? In other words, what are the documents talking about topic 1 for ...
passion's user avatar
  • 1,010
8 votes
2 answers
6k views

Gensim LDA Coherence Score Nan

I created a Gensim LDA Model as shown in this tutorial: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/ lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=...
Ramsha Siddiqui's user avatar
8 votes
3 answers
8k views

How to print out the full distribution of words in an LDA topic in gensim?

The lda.show_topics module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus? from ...
alvas's user avatar
  • 119k
8 votes
2 answers
48k views

How to avoid decoding to str: need a bytes-like object error in pandas?

Here is my code : data = pd.read_csv('asscsv2.csv', encoding = "ISO-8859-1", error_bad_lines=False); data_text = data[['content']] data_text['index'] = data_text.index documents = data_text It looks ...
wayne64001's user avatar
8 votes
1 answer
5k views

Why getting different results with MALLET topic inference for single and batch of documents?

I'm trying to perform LDA topic modeling with Mallet 2.0.7. I can train a LDA model and get good results, judging by the output from the training session. Also, I can use the inferencer built in ...
John Lehmann's user avatar
  • 8,095
8 votes
1 answer
2k views

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is ...
m.khalil's user avatar
8 votes
2 answers
4k views

Topic modelling, but with known topics?

Okay, so usually topic models (such as LDA, pLSI, etc.) are used to infer topics that may be present in a set of documents, in an unsupervised fashion. I would like to know if anyone has any ideas as ...
user1871183's user avatar
8 votes
1 answer
11k views

Pickle AttributeError: Can't get attribute 'Wishart' on <module '__main__' from 'app.py'>

I already run my code to load my variable saved by pickle. This my code import pickle last_priors_file = open('simpanan/priors', 'rb') priors = pickle.load(last_priors_file) and i get error like ...
Anugrah Dwiatmaja Putra's user avatar
7 votes
1 answer
8k views

ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize on topic coherence

im following this tutorials https://towardsdatascience.com/evaluate-topic-model-in-python-latent-dirichlet-allocation-lda-7d57484bb5d0 and find problem. so my purpose on this code to make iterate it ...
adityabrillian's user avatar
7 votes
3 answers
7k views

pyLDAvis with Mallet LDA implementation : LdaMallet object has no attribute 'inference'

is it possible to plot a pyLDAvis with a Mallet implementation of LDA ? I have no troubles with LDA_Model but when I use Mallet I get : 'LdaMallet' object has no attribute 'inference' My code : ...
Saguaro's user avatar
  • 233
7 votes
1 answer
3k views

error Installing topicmodels in R Ubuntu

I am getting error while installing topicmodels package in R. on running install.packages("topicmodels",dependencies=TRUE) following are the last few lines I am getting. Please help. My R version is ...
Mohit Mangal's user avatar
7 votes
3 answers
4k views

Meaning of bar width for pyLDAvis for lambda = 0

Not sure if this is the right forum but I was wondering if anyone understands how to interpret the width of the red vs. blue bars on the right-hand side of pyLDAvis plots when lambda = 0 (see http://...
user3490622's user avatar
7 votes
1 answer
3k views

What is the difference between LDA and NTM in Amazon Sagemaker for Topic Modeling?

I am looking for difference between LDA and NTM . What are some use case where you will use LDA over NTM? As per AWS doc: LDA : The Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm is an ...
Saurabh's user avatar
  • 609
7 votes
5 answers
2k views

Mallet topic model example can not compile

I want to compile mallet in my Java (instead using the command line), so I include the jar in my project, and cite the code of the example from: http://mallet.cs.umass.edu/topics-devel.php, however, ...
flyingmouse's user avatar
  • 1,034
7 votes
3 answers
9k views

Text Clustering and topic extraction

I'm doing some text mining using the excellent scikit-learn module. I'm trying to cluster and classify scientific abstracts. I'm looking for a way to cluster my set of tf-id representations, without ...
Misconstruction's user avatar

1
2 3 4 5
20