Why am I dealing with some larger systems, and there was a regular internal application that consolidated requests from servers for use in common applications for the company.
eg. select * from t1 been converted to:
select * from db1.t1 union select * from db2.t2
and etc.
The main problem is that if you are faced with cross-servers, then on large systems with millions of lines it can hit the network hard and take a lot of time to process requests.
Say, for example, that you are conducting network analysis and you need to make a connection in the tables to determine the "links" of user attributes.
You can get some odd queries that have something like (forgive the syntax):
select db1.user1.boss, db1.user1.name, db2.user.name db2.user.boss from db1 inner join on db1.user.name = db2.user.name
(for example, get a person’s boss, his boss, friend of a friend, etc.)
It can be awesome PITA when you want to get good data to do query type chains, but for simple statistics like sums, averages, etc ... what works best for these guys is a night request , which aggregated statistics into a table on each server (for example, nightlystats). e.g. select countif(user.datecreated>yesterday,1,0) as dailyregistered, sumif(user.quitdate)... into (the new nightly record) .
This made the daily statistics pretty trivial, since the calculations you would just summarize the full column, on average you would multiply the individual value of the byt server by the total number of servers, and then divided by the total amount, etc., and have a pretty fast toolbar at a high level.
We ended up doing a lot of indexing and optimization, and tricks like storing small local tables of commonly used information were useful for speeding up queries.
For larger requests, the db guy just dumped the full system copy in the backup system, and we will use this to process locally throughout the day so as not to get into the network too much.
There are several tricks that can reduce this, for example, to have common tables (for example, basic tables for users, etc. unchanging data, etc.) so that you do not need to spend time collecting them.
Another that is really useful in practice is to combine the totals and totals for simple queries into night tables.
The last thing that is interesting is that the workaround for the bw problem was for the “delayed” timeout to be programmed into the built-in “query aggregator”, what he did was the response time from recording the record, if the time started to linger, he requested fewer records and added latency to the requests he requested (since it was reporting, not temporary, it worked fine)
There are several SQLs that automatically bounce, and I recently read an article about tools (but not php) that will do some of them for you. I think they were associated with vm cloud providers.
There are also some tools and thoughts in this thread: MySQL sharding approaches?
If NoSQL is an option, you can consider all the db systems there before you go along this route.
The NoSQL approach may be easier to scale depending on what you are looking for.