This may be better for Server Fault, but for me it's more of a programming problem. I could be wrong.
I was thinking about how Facebook does what it does. It employs more than 500 million active users. How do they manage to serve all these users? Is there one giant database containing an entry for each individual user, so that whenever someone logs in, authentication is verified on this central machine? I am pretty ignorant about this topic, but I see that such a solution is simply not scalable. There will come a time when the central server simply cannot handle everything.
Instead, say that the central database is divided into 100 databases, so that the load is distributed evenly across all of them. This should be what Facebook does, but how do they know which user account to store on which machine? Is there a record stored on each machine, and when you log in, a random user machine is used for authentication? This means that every time someone registers or changes their password, the changes must be propagated to all 100 servers .
Another solution comes to mind. Perhaps they have a way to hash the user's email address into a specific user database. Then all that web servers should know is a hash algorithm. But this solution creates its own problem, I think. What if you want to add more user databases? Are you modifying the hash algorithm to take into account 101 user databases instead of 100 ? Will you start moving user records so that 101 user databases have the same number of user records? No, that's funny too.
In any case, as you can see, I don’t know too much about how to solve this problem. Does anyone have a recommended reading of this topic?