Apache Drill Performance

Are there any performance tests (genuine) that compare Stinger vs Impala and Drill? Also, which is preferable - my use case will mainly relate to interactive interactive queries on top of Hive. Thanks.

+5
source share
2 answers

There are some performance numbers on the site http://allegro.tech/fast-data-hackathon.html .

In general, we see that Drill and Impala are comparable in performance for interactive queries, and the differentiation of Drill is its ability to query without specifying metadata and the ease of use of JSON data.

Note that these tests are on much earlier versions on Drill, such as 0.8 / 0.9 (also not configured accordingly for data location). Now Drill 1.1 with a lot of improvements in SQL (window functions, etc.) And performance.

+3
source

You cannot do such tests, it does not make sense, and you should never trust such a standard.

It will all depend on your own data, do you have JSON files? prefer a drill. You want to request more than 1 TB, prefer Hive, etc.

In addition, you can consider the file format: JSON, Kudu, Parquet or ORC.

Then comes the optimization, Hive + Tez seems better for parrarel requests, but very slow for a single request. While Impala is the opposite (MapReduce and MassiveParrarelProcessing).

In addition, you want to consider hardware resource, disk SSD or not, etc.

I recommend starting with an Apache Drill + JSON file, then try Apache Drill with Parquet or ORC.

If you need help, describe exactly what you have (data + hardware) and what you want.

+2
source

All Articles