How does the Gmail spam filter work?

I am always amazed at the high quality of the Gmail spam filter. Over the past year, it has filtered out 99.95% of spam and mistakenly blocked only one mail. In comparison, any other email service I have used makes at least one mistake for every 50 emails.

How does Gmail actually do this to achieve this level of quality? Based on customer feedback (i.e. if N customers block mail as spam, is it sorted as spam for every other client)? Or is there some kind of trick? Maybe the main filter filter filters the most obvious spam, and some complex cases are analyzed by real people?

+4
source share
5 answers

In short, this is based on community feedback . Here is a quote from the official explanation:

Gmail users play an important role in saving spam messages from millions of mailboxes. When the Gmail community votes with its clicks to report an email as spam, our system will quickly learn how to block such messages. The more spam a community notes, the smarter our system.

You can read a little about it on the Spam Explained page.

+8
source

This is a million dollar question, and if it could be answered on stackOverflow, then every spam filter would be just as effective.

+7
source

I really donโ€™t know exactly how Google does spam filtering (but I think this is a business secret). If you are interested in how spam filtering works, I would recommend looking at Bayesian SPAM filtering ( http://en.wikipedia.org/wiki/Bayesian_spam_filtering ). This is a pretty easy way to understand.

+2
source

Google most likely uses a classifier system, such as logistic regression or neural networks. Modern spam detection often uses machine learning algorithms such as these.

The classification of the release is โ€œSpamโ€ or โ€œNot Spamโ€, and the entries, Iโ€™m sure, are top secret at Google, but Iโ€™m sure that some textual phrases of the email, such as โ€œBuy Nowโ€, โ€œOn Sale,โ€ Viagra or โ€œ Male Enhancement "- all factors in their model.

0
source

There is no official version on this issue, and most of the proposals are just observations / experts.

Based on my observations on the emails we deliver, here are my findings:

1. User interaction is key: If users do not participate in your emails, your emails will be marked as spam. Here are some indicators: - Who do you send by email and how often do you send by email - What emails do you open - What emails do you reply to - The keywords that you usually read in emails - What emails do you launch , archive or delete.

2. Sender domain reputation: What is the background of the sending domain? If in the past, user interaction was higher, the likelihood of a new email from the same site as when landing in Inbox is high.

Google uses sophisticated AI and machine learning algorithms to make this happen. Although you can succeed by changing the IP address, domain or return path, but all this will be a very short-term hack.

0
source

Source: https://habr.com/ru/post/1316585/


All Articles