As part of Naive Bayes, there is nothing stopping you from adding additional features that are not based on a summary word representation. Say you have a class probability p (document | class_1) = l_1 based on your word functions. You have reason to believe that some binary functions b_1 and b_2 will also help in the classification (this could be a document containing the date and time, respectively, to make the example concrete).
You estimate the probability p (b_1 = 1 | class_1) = (# of documents in class 1 with b_1 = 1) / (# documents in class 1) --- p (b_1 = 0 | class_1) = 1 - p (b_1 = 1 | class_1). You do the same for class 2, and for function b_2 for both classes. Now adding these functions to the classification rule is especially simple, since Naive Bayes simply assumes the independence of the functions. So:
p (class_1 | document) \ propto p (class_1) x l_1 xp (b_1 | class_1) xp (b_2 | class_1)
where l_1 means the same as before (probability based on BOW functions), and for the terms p (b_i | class_1) you use either the terms p (b_i = 1 | class_1) or p (b_i = 0 | class_1) depending from what was actually the value of b_i. This can be extended to non-binary functions in the same way, and you can continue to add to your heart content (although you should be aware that you assume independence between the functions and you can switch to a classifier that does not work do not make this assumption) .
source share