The main difference between dynamic and static separation in Hive

What is the main difference between static and dynamic partitions in Hive? Using separate inserts means static and single insertion into the partition table. Are there any other benefits?

+5
source share
4 answers

Separation in Hive is very useful to cut data during a request to reduce request time.

Partitions are created when data is inserted into a table. Depending on how you load the data, you will need sections. Typically, when loading files (large files) into Hive tables, static partitions are preferred. This saves you time when loading data compared to a dynamic partition. You "statically" add a section to the table and move the file to the table section. Because the files are large, they are usually generated in HDFS. You can get the value of a section column from a file name, day, etc., without reading the entire large file.

Enabling dynamic splitting into the entire large file, i.e. each row of data is read, and the data is divided into MR-tables in the destination table depending on the specific field in the file. Therefore, a usually dynamic section is useful when you are executing some ETL stream in a data pipeline. for example, you upload a huge file using the move command to table X. Then you run an inert query into table Y and split data based on the field in table X say day, country. You might want to run the ETL step again to split the data in the country section of table Y into table Z, where the data is partitioned only by city for a specific country. and etc.

Thus, depending on your final table or data requirements and in what form the data is created in the source, you can choose a static or dynamic partition.

+10
source

in the static partition, we need to specify the value of the section column in each LOAD statement.

Suppose we have a section on the country of the column for the table t1 (username, name, profession, country), so every time we need to specify the value of the country

hive>LOAD DATA INPATH '/hdfs path of the file' INTO TABLE t1 PARTITION(country="US") hive>LOAD DATA INPATH '/hdfs path of the file' INTO TABLE t1 PARTITION(country="UK") 

dynamic section allows us not to specify the value of the column column each time. below we come below:

  • Create a non-segmented table t2 and insert data into it.
  • Now create table t1, divided into the corresponding column (for example, country).
  • load data into t1 from t2 as shown below:

     hive> INSERT INTO TABLE t2 PARTITION(country) SELECT * from T1; 
  • make sure the partitioned column is always the last in the table without partitioning (since we have the country column in t2)

+13
source

Static section in the hive

Inserting input files separately into a partition table is a static partition. Typically, when loading files (large files) into Hive tables, static partitions are preferred

A static section saves you time when loading data compared to a dynamic section. You "statically" add a section to the table and move the file to the table section.

We can change the section in the static section

You can get the value of a section column from a file name, day, etc., without reading the entire large file. If you want to use a static partition in the hive, you must set the property

set hive.mapred.mode = strict
This property is set by default in the hive-site.xml file. The static partition is in strict mode. You must use the where argument to use the limit in the static partition. You can execute the static partition in the "Hive Management" table or in the external table.

Dynamic split in the hive

separate insertion into the partition table is called a dynamic partition

Usually a dynamic partition loads data from a non-network table

Dynamic partitioning takes longer to load data than a static partition

When you have big data stored in a table, then a dynamic partition is suitable.

If you want to specify a column number, but you do not know how many columns a dynamic section also fits

There is no need for a dynamic section when an offer to use a limit. we cannot make changes to the dynamic section

You can execute a dynamic partition on an external hive table and a managed table. If you want to use a dynamic partition in a bush, then the mode is in non-linear mode. Below are the properties of the dynamic hive partitions that you must enable

SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

-1
source

Dynamic breakdown in HIVE:

 CREATE TABLE temps_orc_partition_date (statecode STRING, countrycode STRING, sitenum STRING, paramcode STRING, poc STRING, latitude STRING, longitude STRING, datum STRING, param STRING, timelocal STRING, dategmt STRING, timegmt STRING, degrees double, uom STRING, mdl STRING, uncert STRING, qual STRING, method STRING, methodname STRING, state STRING, county STRING, dateoflastchange STRING) PARTITIONED BY (datelocal STRING) STORED AS ORC; 

move the "datelocal" column last in SELECT. For dynamic partitioning to work in Hive, this is a requirement.

 INSERT INTO TABLE temps_orc_partition_date PARTITION (datelocal) SELECT statecode, countrycode, sitenum, paramcode, poc, latitude, longitude, datum, param, timelocal, dategmt, timegmt, degrees, uom, mdl, uncert, qual, method, methodname, state, county, dateoflastchange, datelocal FROM temps_txt; 
-2
source

All Articles