How does hive / hadoop ensure that every handler works with data that is local to it?

2 main questions that bother me:

  • How can I be sure that each of the 32 files that use the bush to store my tables is located on its unique machine?
  • If this happens, how can I be sure that if the bush creates 32 cardboards, each of them will work on its local data? Does hasoop / hdfs have this magic, or does the hive as a smart app guarantee that this will happen?

Background: I have a cluster of a hive of 32 cars and:

  • All my tables are created using "CLUSTERED BY(MY_KEY) INTO 32 BUCKETS"
  • I use hive.enforce.bucketing = true;
  • I checked, and indeed each table is stored as 32 files in the user / hive / storage folder
  • I use HDFS 2 replication rate

Thank!

+5
2
  • HDFS. . - , , - .
  • HDFS , , Hadoop , . , " " " ". Hadoop, .
+5

Hadoop Map ( ).
, . hive, . , Hive , . : http://www.facebook.com/note.php?note_id=470667928919

+1

All Articles