Just what is the Big Database?

Ok, dumb question that I know, but I see a hazy comment on the "big database" as well as small and medium, and I wonder what that means. Can anyone determine what small, medium and large database are neophytes of SQL for us?

+54
database
Mar 15 '09 at 3:18
source share
8 answers

There is no threshold when a small database becomes medium or the average database becomes large. As a rule, when I hear these terms, I think about specific orders of magnitude with the storage of common notes.

  • Small: 10 5 or fewer entries.
  • Average: 10 5 to 10 7 records.
  • Large: 10 7 to 10 9 entries.
  • Very large: 10 9 or more entries.

As the poster le dorfier suggested, you can also think about it in terms of the properties that each kind of database has. Categorizing it like this, I would say:

  • Small: performance is not a concern. Your requests are executed normally, without any special optimizations. You see only a slight performance difference when using first-line improvements such as indexes.

  • Average value. There are probably one or more employees in the database who are part-time assigned. These people pay attention to the health of the database; their primary administrative responsibility is to prevent unacceptable performance issues and minimize downtime.

  • Large: There is probably a dedicated employee (s) whose job it is to work with the database and increase productivity, and to ensure that changes to the application do not break the circuit throughout the life of the database. Monitoring of health and database status is monitored. Understanding and performing optimization requires considerable experience.

  • Very large: the database stores a huge amount of information that should be easily accessible. Performance optimization is absolutely necessary to extract every last ounce of speed from each query, and without it, the database will be much less suitable or even impossible to use. A database can use sophisticated or innovative replication or clustering techniques, pushing the boundaries of current technology.

Please note that they are completely subjective and that someone may well have a perfectly legitimate alternative definition of "large."

+84
Mar 15 '09 at 3:23
source share
— -

One way to understand this is to observe your test requests.

A small database is one where indexes don't matter.

An average database is one where queries take more than one second if you do not have an appropriate index.

A large database is one where queries often take time to optimize using a combination of query design, index modification, and many testing cycles.

+26
Mar 15 '09 at 3:23
source share

The best answer, hands-down: a large database is one that makes you stop using relational databases.

In other words, a normalized relational database where all the indexes in the world cannot help you meet your response time requirements due to massive JOINs.

If you have ever had to give up relational databases for something else, you are either a weak database developer, not an expert database administrator, or a very large database.

+4
Mar 15 '09 at 4:27
source share

The Big Database is a really vague concept. The answers to this question already have very different answers and opinions. Some approaches to the definition of “small”, “medium” and “large” databases may make more sense than others, but at some point I believe that each definition is correct, true and valid.

Some definitions make more sense than others, as they focus on various aspects of importance for the design, programming, use, maintenance, and administration of the database, and these various aspects are what really matter for the database used. It just happens that all of these aspects are affected by the foggy concept of "Database Size".

So, does this mean that it doesn't matter, can you determine if a particular database is large or not?

Of course not. This means that you will apply the concept in different ways, evaluating the different design / operational / administrative aspects of your database. It also means that every time this concept will be foggy.

As an example: the database indexing strategy (the aspect of database design) depends on the number of records for each table (measure "size"), the size of the record by the number of records (another dimension "size") and Query Vs. The ratio of Creation / Update / Delete (aspect of the use of the database).

Response response time is better if indexes are used for tables with a large number of records. Depending on the nature of the WHERE, ORDER BY clauses and record aggregation, you may need several indexes for specific tables.

The operations of creating, updating, and deleting negatively affect the increase in the number of indices in the affected table (s). More indexes for the affected table mean more changes that the RDBMS needs to make, spending more time and more resources to apply these changes.

In addition, if your RDBMS spends more time applying these changes, locks are also supported for a longer time, which also affects the response time to other requests sent to the system at the same time.

So, how do you balance the number and design of your indexes? How do you know if you need an additional index, and if, adding this index, you do not have a big negative impact on the response time to the request? Answer. You test and profile your target load database according to your load / performance requirements and analyze the profiling data to see if further optimizations / redesigns / indexes are needed.

Different index strategies are required for different queries. The ratio of Creation / Update / Delete. If your database is under a heavy load of queries, but rarely updated, the performance for the entire application will be better if you add every index that improves query response time. On the other hand, if your database is constantly being updated, but there are no large operations with queries, then the performance will be better if you use less indexes.

There are, of course, other aspects: database schema design, storage strategy, network design, backup strategy, stored procedures / triggers, etc. programming, applied programming (against the database), etc. All these aspects are influenced by different concepts of "size" (record size, number of records, index size, index, schema, storage size, etc.).

I would like to have more time, because this topic is fascinating. I hope this small contribution serves as a starting point for you in this fascinating world of SQL.

+3
Mar 15 '09 at 4:28
source share

You must consider the advancement of equipment for this definition:

  • Small database: the working set fits into the physical memory of one product server (about 16 GB now)

  • Medium database: fits in one or more (via RAID) commodity hard drives on one machine (up to several TB now)

  • Large database: data must be distributed between several product servers to match (up to several PBs now.)

+3
Mar 15 '09 at 18:40
source share

According to the Wikipedia article on the Very Large Database

A very large database, or VLDB, is a database containing an extremely large number of tuples (database rows) or taking up extremely large storage space for the physical file system. The most common definition for VLDB is a database that takes up more than 1 terabyte or contains several billion rows, although naturally this definition changes over time.

+2
Mar 15 '09 at 3:23
source share

I think something like Wikipedia, or the US Census, is a "big" database. My personal address lists or todos are a small database. A medium-sized database is a cross between them.

You can try to determine the size of how many servers you need. A small database is a component of the application that you run on your desktop, a medium-sized database will be somewhere a single mysql server (anywhere), and a large database will require several servers with some support for replication / crash recovery.

0
Mar 15 '09 at 3:27
source share

If you have a database large enough that you cannot just “back up” to place a development or test window, you will likely have a “large database”.

0
Mar 16 '09 at 18:02
source share



All Articles