Obtaining TF-IDF Points Using Gensim

I am trying to find the most important words in the corpus based on their TF-IDF ratings.

Following the example at https://radimrehurek.com/gensim/tut2.html . Based on

>>> for doc in corpus_tfidf:
...     print(doc)

The TF-IDF score is updated at each iteration. For instance,

  • Word 0 ("computer" based on https://radimrehurek.com/gensim/tut1.html ) has a TF-IDF rating of 0.5773 (DoC # 1), 0.4442 (DoC # 2).
  • Word 10 ("graph") has a TF-IDF rating of 0.7071 (Do # 7), 0.5080 (Do # 8), 0.4588 (Do # 9)

So, this is how I am currently getting the final TF-IDF score for each word,

tfidf = gensim.models.tfidfmodel.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
d = {}
for doc in corpus_tfidf:
    for id, value in doc:
        word = dictionary.get(id)
        d[word] = value

Is there a better way?

Thanks in advance.

+4
source share
1

?

d = {dictionary.get(id): value for doc in corpus_tfidf for id, value in doc}
+2

All Articles