Cassandra is much slower than mysql for simple operations?

I see many statements like: "Cassandra writes very fast," "Cassandra reads very slower than it writes, but much faster than Mysql."

On my windows7 system: I installed the mysql configuration by default. I installed PHP5 by default. I installed Casssandra by default.

Performing a simple mysql write test: "INSERT INTO wp_test ( id , title ) VALUES ('id01', 'test')" gives me the result: 0.0002 (s) For 1000 inserts: 0.1106 (s)

Performing a simple write test on Cassandra: $ column_faily-> insert ('id01', array ('title' => 'test')) gives me the result: 0.005 (s) For 1000 inserts: 1.047 (s)

For reading the tests, I also got that Cassandra is much slower than mysql.

So the question is, does it sound right that I have 5ms for one write operation on Cassadra? Or something is wrong and should be at least 0.5 ms.

+7
source share
4 answers

When people say that โ€œCassandra is faster than MySQLโ€, they mean when you are dealing with terabytes of data and many concurrent users. Cassandra (and many common NoSQL databases) is optimized for hundreds of simultaneous readers and writers on many nodes, unlike MySQL (and other relational databases), which are optimized for fast work on a single node, but usually when you try to scale them on multiple nodes. There is a generalization of this trade-off, by the way - the absolute fastest disk I / O - these are old old UNIX files, and many of them use delay-sensitive financial applications for this reason.

If you are building the next Facebook, you want something like Cassandra, because one MySQL box will never withstand the punishment of thousands of simultaneous reads and writes, while with Cassandra you can scale to hundreds of data nodes and easily cope with this load. See scaling and scaling .

Another use case is when you need to apply a lot of batch processing power up to terabytes or petabytes of data. Cassandra or HBase are great because they are integrated with MapReduce, which allows you to run processing on data nodes. With MySQL, you will need to extract data and sprinkle it through a grid of processing nodes, which will consume large network bandwidth and entail many unnecessary complications.

+11
source

Cassandra benefits greatly from parallelization and dosing. Try to make 1 million investments on each of the 100 threads (each with its own connection and lots of 100) and see which ones are faster.

Finally, Cassandra insert performance should be relatively stable (maintaining high throughput for a very long time). With MySQL, you will find that it shrinks dramatically as soon as the btrees used for indexes increase too much memory.

+2
source

It is likely that the maturity of MySQL drivers, especially the improved MySQL drivers in PHP 5.3, has some effect on the tests. Itโ€™s also possible that the simplicity of the data in your query affects the results โ€” perhaps 100 inserts of values, Cassandra gets faster.

Try the same test from the command line and see what timestamps are, then try with a different number of values. You cannot do a single test and base your decision on this.

0
source

Many user space factors can affect recording performance. Such as the:

  • Dozens of settings in each database server configuration.
  • Table structure and settings.
  • Connection Settings.
  • Request Settings.

Do you swallow warnings or exceptions? It is assumed that a MySQL sample at face value will cause a duplicate key error. This can fail without doing anything at all. What Cassandra could have done in the same case is not familiar to me.

My limited experience with Cassandra tells me one thing about inserts, while the performance of everything else decreases as the data grows, the inserts seem to support the same speed. How quickly it compares with MySQL, I have not tested.

This may not be so fast, because the inserts are fast, but rather not slow. If you want a more meaningful test, you need to enable parallelism and more scripting options, such as large datasets, packages of different sizes, etc. More sophisticated tests can check the latency for data availability after insertion and read speed over time.

It wonโ€™t surprise me if Cassandra is the first port of call to insert data into the queue or just add it. This is customizable if you look at the level of consistency. MySQL also allows you to balance performance and reliability / availability, although each will have its own options for what they allow and do not allow.

In addition, if you do not enter the internal organs, it can be difficult to say why one works better than the other.

I did some tests of the use case that I had for Cassandra some time ago. For the test, he would first insert tens of thousands of lines. I had to disconnect the script for a few seconds, because otherwise the queries executed after the fact would not see the data, and the results would be incompatible between the implementations that I tested.

If you really need quick inserts, add the file to ramdisk.

0
source

All Articles