I am new to Hadoop Hive and I am developing a reporting solution. The problem is that query performance is very slow (hive 0.10, hbase 0.94, hadoop 1.1.1). One of the queries:
select a.*, b.country, b.city from p_country_town_hotel b inner join p_hotel_rev_agg_period a on (a.key.hotel = b.hotel) where b.hotel = 'AdriaPraha' and a.min_date < '20130701' order by a.min_date desc limit 10;
which takes quite a while (50 seconds). I know that I know, the connection is in the string field, and not on the integer, but the data sets are not large (about 3300 and 100000 records). I tried hints of this SQL, but it did not work out faster. The same query on MS SQL Server lasts 1 s. Also, a simple counter (*) from the table lasts 7-8 seconds, which is shocking (the table has 3300 entries). I really donโt know what the problem is? Any ideas or am I misinterpreting Hadoop?
user2346868
source share