Wordpress has a spam filtering plugin called Akismet, which seems to be able to classify any block of text as spam or not. The only caveat is that you need to go through the interface and their database / algorithm is not open or accessible to other users.
There are also commercial providers that provide an API available on the Internet so that you can categorize emails, comments, or any other text sent by users in your web application.
Is there any open or freely accessible database that can classify a block of text as spam / non-spam?
Edit: Here is a clearer explanation of what I want
Basically, I hoped that there was an extensive database with the probability of certain phrases appearing. Since (I guess) spammers spam all email addresses the same way, pre-populating my Bayesian spam filter with this database, I could create an application that starts by capturing most spam without any user training.
source
share