How does hive / hadoop ensure that every handler works with data that is local to it?

Question

2 main questions that bother me:

How can I be sure that each of the 32 files that use the bush to store my tables is located on its unique machine?
If this happens, how can I be sure that if the bush creates 32 cardboards, each of them will work on its local data? Does hasoop / hdfs have this magic, or does the hive as a smart app guarantee that this will happen?

Background: I have a cluster of a hive of 32 cars and:

All my tables are created using "CLUSTERED BY(MY_KEY) INTO 32 BUCKETS"
I use hive.enforce.bucketing = true;
I checked, and indeed each table is stored as 32 files in the user / hive / storage folder
I use HDFS 2 replication rate

Thank!

+5

ihadanny 04 . '11 12:56

2

+1

David Gruzman 05 . '11 8:49

Spike Gronim · Accepted Answer · 2011-08-04T22:46:38+0000