As a school assignment, I need to implement the Naive Bayes algorithm, which I intend to do in Java.
In an attempt to understand how this is done, I read the book "Data Mining - Practical Tools and Methods of Machine Learning", which has a section on this topic, but I'm still not sure about some of the main points that block my progress.
Since I'm looking for guidance, not a solution here, I will tell you guys what I think in my head, what I think is right, and in return ask for a correction / guide that I really like. note that I am an absolute newbie in Naive Bayes algorithm, data mining and general programming, so you can see silly comments / calculations below:
The established training data set has 4 attributes / functions that are numerical and normalized (in the range [0 1]) using Weka (without missing values) and one nominal class (yes / no)
1) The data coming from the csv file is numeric. Hence
- * Given that the attributes are numerical, I use the PDF formula (probability density function).
- + To compute PDF in java, first we split the attributes based on whether they are in the yes or class no class and hold them in another array
(array class yes and array class no)- + Then calculate the average value (
sum of the values in row / number of values in that row ) and the standard prediction for each of the 4 attributes (columns) of each class
- + Now, to find the PDF of the set value (n) I do
(n-mean)^2/(2*SD^2),P( yes | E) and
P( no | E) I
multiply the PDF value of all 4 given attributes and compare which is larger , which indicates the class that it belongs to
In Java temrs, I use ArrayList of ArrayList and Double to store attribute values.
Finally, I'm not sure how to get new data? Should I request an input file (e.g. csv) or command line and request 4 values?
I will stay here for the moment (I have more questions), but I worry it will not give any answers how much time it received. I will be very grateful to those who take the time to read my problems and comments.
java algorithm data-mining
ke3pup
source share