How to indicate a preliminary probability for studying scikit Naive Bayes

Question

How to indicate a preliminary probability for studying scikit Naive Bayes

I am using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms that I use is the implementation of Gaussian Naive Bayes. One of the attributes of the GaussianNB () function is the following:

class_prior_ : array, shape (n_classes,)

I want to change the class earlier manually, because the data that I use is very distorted, and recalling one of the classes is very important. When assigning a high probability for this class, the recall should increase.

However, I cannot figure out how to set the attribute correctly. I already read the topics below, but their answers do not work for me.

How can I set previous probabilities for Naive Bayes clf in scikit-learn?

How do I know that earlier I let sci-kit learn? (Classifiers of naive bays.)

This is my code:

 gnb = GaussianNB() gnb.class_prior_ = [0.1, 0.9] gnb.fit(data.XTrain, yTrain) yPredicted = gnb.predict(data.XTest)

I realized that this was the correct syntax, and I could find out which class belongs to that place in the array, playing with the values, but the results remain unchanged. There were also no errors.

What is the correct way to set attributes of the GaussianNB algorithm from the scikit-learn library?

Scikit GaussianNB documentation link

+8

python syntax scikit-learn machine-learning

pevadi Jun 17 '15 at 15:45

source share

2 answers

@Jianxun Li: there is actually a way to establish previous probabilities in GaussianNB. It is called "priors" and is available as a parameter. See Documentation: "Parameters: priors: array-like, shape (n_classes,) Preliminary class probabilities. If specified, priorities are not adjusted according to the data." So let me give you an example:

 from sklearn.naive_bayes import GaussianNB # minimal dataset X = [[1, 0], [1, 0], [0, 1]] y = [0, 0, 1] # use empirical prior, learned from y mn = GaussianNB() print mn.fit(X,y).predict([1,1]) print mn.class_prior_ >>>[0] >>>[ 0.66666667 0.33333333]

But if you changed the previous probabilities, this will give a different answer, which, in your opinion, I think.

 # use custom prior to make 1 more likely mn = GaussianNB(priors=[0.1, 0.9]) mn.fit(X,y).predict([1,1]) >>>>array([1])

+5

Ram seshadri May 04 '17 at 17:28

source share

Jianxun li · Accepted Answer · 2015-06-17T16:14:40+0000

GaussianNB (), implemented in scikit-learn, does not allow you to set the class earlier. If you read the online documentation, you will see that .class_prior_ is an attribute , not parameters . Once you have placed GaussianNB (), you can access the class_prior_ attribute. It is calculated by simply counting the number of different tags in your case study.

 from sklearn.datasets import make_classification from sklearn.naive_bayes import GaussianNB # simulate data with unbalanced weights X, y = make_classification(n_samples=1000, weights=[0.1, 0.9]) # your GNB estimator gnb = GaussianNB() gnb.fit(X, y) gnb.class_prior_ Out[168]: array([ 0.105, 0.895]) gnb.get_params() Out[169]: {}

You see that the evaluator is smart enough to take into account the unbalanced weight problem. This way you do not need to manually prioritize.

How to indicate a preliminary probability for studying scikit Naive Bayes

More articles: