How to structure an extremely large table

This is a more conceptual question. This inspired the use of some kind of extremely large table, where even a simple query takes a lot of time (correctly indexed). I wondered if there is a better structure, and then just letting the table grow, constantly.

By and large, I mean 10,000,000+ records, which grow every day by about 10,000 per day. A table like this will reach 10,000,000 additional entries every 2.7 years. Suppose that later entries are available the most, but older ones should remain available. I have two conceptual ideas to speed this up.

1) Maintaining the main table containing all the data indexed by date in reverse order. Create a separate view for each year that only stores data for that year. Then, when prompted, and suppose the query is expected to pull out only a few records in three years, I could use the join to combine the three views and select from them.

2) Another option is to create a separate table for each year. Then merge them again when prompted using the join.

Does anyone have any other ideas or ideas? I know that this is the problem Facebook was faced with, since, in your opinion, they coped with it? I doubt that they have one table (status_updates) that contains 100,000,000,000 records.

+7
source share
5 answers

Major RDBMS vendors have similar concepts in terms of partitioned tables and partitioned views (and their combinations)

There is one instant advantage that the data is now divided into several conceptual tables, so any query that includes the section key in the query can automatically ignore any section in which the key will not be.

From an RDBMS management point of view, data divided into separate partitions allows you to perform operations at the partition level, back up / restore / index, etc. This helps reduce downtime and also allows you to archive much faster by simply deleting the entire partition at a time.

There are also mechanisms for relational storage, such as nosql, map reduce, etc., but in the end, how it is used is loaded and the data is archived, becoming a driving factor in solving the structure used.

10 million rows are not so large on the scale of large systems, partitioned systems can and will contain billions of rows.

+3
source

The second idea looks like a split.

I don’t know how well this works, but there is support for the partition in MySQL - see its manual: Chapter 17. Markup

+2
source

There is a good approach to scalability for these tables. Union is the right way, but there is a better way.

If your database engine supports "semantic partitioning", you can partition one table. Each section will cover some sub-ranges (for example, 1 section per year). This will not affect anything in the SQL syntax except DDL. And the engine will transparently launch hidden association logic and partitioned index scanning with all parallel hardware (CPU, I / O, storage).

For example, Sybase allows up to 255 partitions, as this is a join restriction. But you will never need the keyword "union" in queries.

+2
source

Often the best plan is to have one table and then use database partitioning.

Or you can archive data and create a view for archived and combined data and save only the active data in the table that most functions reference. You must have a good archiving strategy, though (which is automated), or you may lose data or not do something efficiently when moving. This is usually more difficult to maintain.

+1
source

What you say is horizontal splitting or sharding .

+1
source

All Articles