Hadoop Hive Slow Requests

I am new to Hadoop Hive and I am developing a reporting solution. The problem is that query performance is very slow (hive 0.10, hbase 0.94, hadoop 1.1.1). One of the queries:

select a.*, b.country, b.city from p_country_town_hotel b inner join p_hotel_rev_agg_period a on (a.key.hotel = b.hotel) where b.hotel = 'AdriaPraha' and a.min_date < '20130701' order by a.min_date desc limit 10; 

which takes quite a while (50 seconds). I know that I know, the connection is in the string field, and not on the integer, but the data sets are not large (about 3300 and 100000 records). I tried hints of this SQL, but it did not work out faster. The same query on MS SQL Server lasts 1 s. Also, a simple counter (*) from the table lasts 7-8 seconds, which is shocking (the table has 3300 entries). I really donโ€™t know what the problem is? Any ideas or am I misinterpreting Hadoop?

+7
source share
4 answers

Yes ... you misinterpreted Hadoop. Hadoop and Hive are also not designed for real time. They are most suitable for autonomous work with batch processing. They are not at all a substitute for an RDBMS. Although you can fine-tune it, โ€œabsolute real timeโ€ is not possible. There are a lot of things that happen under the hood when you run a hive request, which I think you donโ€™t know. First of all, the Hive request is converted to the corresponding MR task, followed by several other things, such as creating a split, creating records, creating a map, etc. I would never suggest Hadoop (or Hive) if you have real-time needs.

You can take a look at Impala for your real-time needs.

+14
source

Hive is not a suitable tool for working in real time, but if you want to use the Hadoop infrastructure in real time or in quick access to data, see HBase . This value addition is quick access. Not sure why you choose Hadoop for your solution, but Hbase sits on top of HDFS, which some people like because of the inherent redundancy of HDFS (you copy the file there once and automatically replicate), which may be one of the reasons you look at Hadoop.

For more information: read this question

+4
source

I'm not sure how new you are to hasoop.Hive doesn't give you results at interactive speeds like small tables. If you already knew about this and tried to configure the request, you can try:

 select a.*, b.country, b.city from (select * from p_country_town_hotel where hotel= 'AdriaPraha') b inner join (select * from p_hotel_rev_agg_period where min_date < '20130701') a on a.key.hotel = b.hotel order by a.min_date desc limit 10; 

If you know that one of the tables is small enough to fit into memory, you can try to join the side of the map.

+1
source

use http://phoenix.apache.org/ for real-time queries such as

+1
source

All Articles