How does Facebook do it?

Have you ever noticed how facebook says “3 friends and 33 others liked it”? I was wondering what is the best way to do this. I don’t think that I’m browsing the list of friends, and the list of users who “liked” and comparing them is effective at all! Do they track this in the database? This will make the database size very large. What do you guys think?

Thanks!

+4
source share
8 answers

I would suggest that they externally join their friends table with their likes table in order to simultaneously count both ordinary likes and friends.

With appropriate indexes, this will not be a slow query at all. Huge databases are not necessarily slow, so there is no reason not to store all this information in a database. The trick is to make sure that indexes and partitions (if any) are configured well.

+8
source

Facebook uses Cassandra , a NoSQL database, at least for some things. Here's a more detailed discussion of what some of the larger social networking sites are doing to solve these problems:

http://www.25hoursaday.com/weblog/2009/09/10/BuildingScalableDatabasesDenormalizationTheNoSQLMovementAndDigg.aspx

A lot of interesting reading is there if you follow the links to it on the Digg blog, etc.

+5
source

Yes, they definitely store it in their database, because they definitely have more than one server that needs to access the data.

As for scalability, I'm sure they use a lot of caching.

Here is an example:

If you need to go through 1 million rows, the index can do O (logn) = 20 operations (in the worst case) just to find what you need.

For 2 million, you only need 21 operations (in the worst case) to find what you need.

Each time you double the number of users to go through, you just need only 1 operation (in the worst case) with an index of O (logn).

They also have a distributed architecture or cluster database.

+4
source

Facebook should use a trigger (which starts automatically as soon as an event occurs).

For example, suppose that a trigger is created to store counters and names of people who liked the status, then it will be executed every time someone likes your status, and this is also implicit (automatic).

This makes the work too easy, and Facebook does not need to manually update the database or store a huge database for it. In addition, this approach is faster.

+3
source

When developing social media software (mothsorchid.com), I found the only way to address this was to pre-cache notification flows. One does not query the database during page loading to calculate how many friends and others like it when someone likes “something that is written on the object”, and when retrieving the object, you can compare the list of friends with the current user. it updates its profile / makes a comment / etc, it sends notification objects to friends that are pre-cached in its channels, drastically reduces the database operation due to disk space, but disk space is cheap.

As for how Facebook does this, they use the Cassandra DBMS, which is probably a little different from what you mean.

+2
source

Keep in mind that Facebook uses memcached heavily, so they save a lot of data in memory and only update it when absolutely necessary. See this blog post for a discussion of scalability around this:

http://www.facebook.com/note.php?note_id=39391378919

+1
source

Each entry that someone might like, probably contains a list of everyone who likes it (all this, of course, in the database). When you look at this post, they match it in your friends list to find out which one is your friend. Voila.

0
source

Many of these are explained by the Facebook Design Director in this QCon presentation:

http://www.infoq.com/presentations/Facebook-Software-Stack

Great presentation to watch .....

0
source

All Articles