I am trying to classify using python. I use the Naive Bayes MultinomialNB classifier for web pages (extracting a web form of data into text, later I classify this text: classification on the Internet).
Now I am trying to apply PCA to this data, but python gives some errors.
My classification code with Naive Bayes:
from sklearn import PCA from sklearn import RandomizedPCA from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB vectorizer = CountVectorizer() classifer = MultinomialNB(alpha=.01) x_train = vectorizer.fit_transform(temizdata) classifer.fit(x_train, y_train)
This classification of naive bays gives the result:
>>> x_train <43x4429 sparse matrix of type '<class 'numpy.int64'>' with 6302 stored elements in Compressed Sparse Row format> >>> print(x_train) (0, 2966) 1 (0, 1974) 1 (0, 3296) 1 .. .. (42, 1629) 1 (42, 2833) 1 (42, 876) 1
What am I trying to apply PCA to my data ( temizdata ):
>>> v_temizdata = vectorizer.fit_transform(temizdata) >>> pca_t = PCA.fit_transform(v_temizdata) >>> pca_t = PCA().fit_transform(v_temizdata)
but it brings up the following erros:
raise TypeError ('The resolved matrix was passed but dense' TypeError: A a sparse matrix was resolved, but dense data is needed. Use X.toarray () to convert to a dense numpy array.
Convert matrix to densematrix or numpy matrix. Then I tried to create a new densematrix, but I have an error.
My main goal is that the PCA test affects the classification of the text.
Convert to a dense array:
v_temizdatatodense = v_temizdata.todense() pca_t = PCA().fit_transform(v_temizdatatodense)
Finally try classfy:
classifer.fit(pca_t,y_train)
error for final classfy:
raise the value of ValueError ("Input X must be non-negative") ValueError: Input X must be non-negative
On the one hand, my data ( temizdata ) is placed only in Naive Bayes, on the other hand, temizdata first placed in the PCA (to reduce inputs), which are classified. __