Which DB should I use?

Now I am creating an application that needs to store and process large amounts of data. So now I'm struggling with the question - which database should I use.

My requirements:

  • Process up to ~ 100,000 insert commands of the second (sometimes several from different threads). 100,000 is the peak; In most cases, the amount will be from a few hundred to several thousand.
  • Keep millions of records.
  • Request data as quickly as possible.
  • Some of the data properties are changed for each object that is suitable for behavior that is not related to the relational database, more relational. However, the sum of the possible properties is small, so it can be represented as columns in a relational database (if this happens much faster).
  • Update commands are rare.

Which database would you recommend to me?

Thanks!

Update: I am using OS, not Windows. I thought that if SQL Server is the most recommended database, then I can switch, but from your answers it is not.

As for the budget - I will start with the cheapest option, and I think that this will change as soon as the company has more money and more users.

No one recommended a no-sql database. Is it really so bad for this kind of requirement?

+6
database
source share
5 answers

The answer depends on asking additional questions, for example, how much you want to spend, what OS you use and what experience you have.

The database that I know of can handle such a massive scale: DB2, Oracle, Teradata, and SQL Server. MySQL may also be an option, although I'm not sure about its performance capabilities.

There are others, I'm sure, designed for mass-scale data processing that you offer, and you may also need to study them.

So, if your OS is not Windows, you can exclude SQL Server.

If you're going for less, MySQL might be an option.

DB2 and Oracle are mature database systems. If your system is a mainframe (IBM 370), I would recommend DB2, but there might be an option for Unix-based.

I don’t know much about Teradata, but I know that it is specially designed for a huge amount of data, so it may be closer to what you are looking for.

A more complete list of options can be found here: http://en.wikipedia.org/wiki/List_of_relational_database_management_systems

A decent database comparison is here: http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

100,000+ inserts the second one - this is a huge amount, no matter what you choose, you are looking to spend a fortune on equipment to handle it.

+3
source share

It is not a question of which database to choose; it is a question of your skills and experience.

If you think that this is possible with one physical machine - you are mistaken. If you know that you need to use several machines, then why are you asking about a database? DB is not as important as how you work with it.

Start with a database with a record on only one server and scale it vertically. Use multiple read-only servers and scale them horizontally (here the document database can be selected almost always safely). The CQRS concept is what will ask your upcoming questions.

+2
source share

"Process up to 100,000 insert instructions of the second" - is it a peak or normal operation? If normal operation, your “millions of recorded records” are likely to be billions ...

With such questions, I think it is useful to understand the business problem, since these are non-trivial requirements! The question arises whether this problem justifies the approach of “brute force” or alternative ways to look at it to achieve the same goal.

If necessary, you might consider whether there are data aggregation / conversion methods (bulk data loading / discarding multiple updates for the same record / loading into multi-level databases, and then combining downstream as a combined ETL set, perhaps ) to simplify the management of this volume.

0
source share

The first thing I'll worry about is your drive layout, you have a mixed workload (OLTP and OLAP), so it is extremely important that your drives are sorted and placed correctly to achieve this throughput if your IO sub system cannot handle the load, then it doesn’t matter which database you will use

In addition, it is possible that 100,000 inserts per second can be loaded in bulk, a bit of 100,000 lines per second is 72,000,000 lines in just 12 hours, so maybe you want to save billions of lines?

0
source share

You probably cannot handle 100,000 individual insert operations per second; you will certainly need to deliver them to a more manageable number.

One thread would not be able to execute many commands anyway, so I expected that there would be 100-1000 threads that would make these inserts.

Depending on your application, you probably need some kind of high availability. If you are not doing something like a scientific application.

My advice is to hire someone who has a reliable answer for you - ideally the one who has done this before - if you do not know, you will not be able to develop the application. Hire a senior developer who can answer this question. Ask them in your interview if you want.

0
source share

All Articles