Practical size limits for RDBMS

I am working on a project that is supposed to store very large data sets and related reference data. I never came across a project that required tables to be so large. I have proven that at least one development environment cannot handle the database processing required by the complex view queries generated by the application level (views with multiple internal and external joins, grouping, summing, and averaging over tables with 90 million rows )

The RDBMS environment with which I tested is DB2 on AIX. Developer environment crashed from 1/20 of the volume that will be processed during production. I'm sure manufacturing equipment is superior to dev hardware and intermediate hardware, but I just don't think it can handle the sheer volume of data and query complexity.

Before the dev environment failed, it took more than 5 minutes to return a small data set (several hundred rows) that was created by a complex query (many joins, a lot of grouping, summing and averaging) against large tables.

I get the feeling that the db architecture should change so that the aggregations currently provided by the views are performed as part of the batch process outside the peak.

Now for my question. I am sure that people who claim to have experience with these kinds of things (which I don’t know) that my fears are unfounded. They are? Can modern RDBMS (SQL Server 2008, Oracle, DB2) cope with the volume and complexity that I described (given the appropriate amount of hardware), or are we in the field of technologies such as Google BigTable?

I hope to receive answers from people who actually had to work with this volume at a non-theoretical level.

The nature of the data is financial transactions (dates, amounts, geographic locations, enterprises), so almost all types of data are presented. All reference data are normalized, hence multiple connections.

+5
source share
5

SQL Server 2008, . , , , .. ( ) , , < 1 , 15-30 , ..

, , .

, , 9 10, SQL, / .

, , , - . /, .., , , . ( ), , , . , , .

, : , , - /, / script , , .

+5

, . - SQL , .., OLAP Cube . , . , .

+2

90 90 , , - . , .

, , ( ).

, N , , ( , ) - . , .

( , , ), , , , // .

+2

1/20 , , Google Big Table. NoSQL

, MongoDB - NoSQL RDMS. , , .

+1

( ) SQL Server 2005 .

, , , .

The same models do not work well on Teradata, but I understand that if we remake the model in 3NF, then parallelizing Teradata will work much better. Installing Teradata is many times more expensive than installing SQL Server, so it just shows how important the difference is in modeling and comparing data and processes with a basic set of functions.

Without knowing more about your data and how it is currently being modeled, and what indexing options you found it difficult to say more.

+1
source

All Articles