I created a tf-idf matrix, but now I want to get 2 upper words for each document. I want to pass the document id and it should give me 2 upper words.
Now I have the data for this sample:
from sklearn.feature_extraction.text import TfidfVectorizer
d = {'doc1':"this is the first document",'doc2':"it is a sunny day"}
test_v = TfidfVectorizer(min_df=1)
t = test_v.fit_transform(d.values())
feature_names = test_v.get_feature_names()
>>> feature_names
['day', 'document', 'first', 'is', 'it', 'sunny', 'the', 'this']
>>> t.toarray()
array([[ 0. , 0.47107781, 0.47107781, 0.33517574, 0. ,
0. , 0.47107781, 0.47107781],
[ 0.53404633, 0. , 0. , 0.37997836, 0.53404633,
0.53404633, 0. , 0. ]])
I can access the matrix by specifying a row number, for example.
>>> t[0,1]
0.47107781233161794
Is there a way to access this matrix by document id? In my case, "doc1" and "doc2".
thank