How does Hive decide when to use a map to reduce and when not to?

Question

How does Hive decide when to use a map to reduce and when not to?

As a simple example,

select * from tablename;

DOES NOT DISCONNECT to map reduction, but

select count(*) from tablename;

DOES. What is the general principle used to determine when to use map abbreviation (on the hive)?

+5

mapreduce hadoop hive

Lazer Sep 19 '11 at 4:27

source share

4 answers

* tablename;

HDFS, MapReduce.

+1

wlk 20 . '11 17:47

, * tablename, Hive - (min/max/count ..). FetchTask, mapreduce.

Hive. hive.fetch.task.conversion (, FETCH) .

hadoop: hasoop fs -cat _

select colNames tablename, , "" , .

+1

Pardeep Sharma 11 . '18 17:31

source share

This is an optimization method, the hive.fetch.task.conversionproperty can (FETCH) minimize mapreduce latency overhead.

When executing SELECT, LIMIT, FETCH queries, this property skips mapreduce and uses the FETCH task.

This property can have 3 values - none, minimal(default) and more.

-1

user6260103 Apr 27 '16 at 7:17

source share

Donald miner · Accepted Answer · 2011-09-19T04:41:15+0000

In general, any kind of aggregation, such as min / max / count, will require a MapReduce job. This probably won't explain everything to you.

In the style of many RDBMS, there is a EXPLAINkeyword that will describe how your data request is translated into MapReduce jobs. Try to explain both your sample queries and see what he is trying to do behind the scenes.

How does Hive decide when to use a map to reduce and when not to?

More articles: