Failed to restore index in Hive on Azure HDInsight with Tez

I am trying to create Hive indexes on Azure HDInsight with Tez enabled. I can successfully create indexes, but I cannot rebuild them: the work ended with this output:

Map 1: -/- Reducer 2: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1421234198072_0091_1_01, diagnostics=[Vertex Input: measures initializer failed.] Vertex killed, vertexName=Reducer 2, vertexId=vertex_1421234198072_0091_1_00, diagnostics=[Vertex > received Kill in INITED state.] DAG failed due to vertex failure. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask 

I created a table and indexes with the following task:

 DROP TABLE IF EXISTS Measures; CREATE TABLE Measures( topology string, val double, date timestamp, ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE LOCATION 'wasb://<mycontainer>@<mystorage>.blob.core.windows.net/'; CREATE INDEX measures_index_topology ON TABLE Measures (topology) AS 'COMPACT' WITH DEFERRED REBUILD; CREATE INDEX measures_index_date ON TABLE Measures (date) AS 'COMPACT' WITH DEFERRED REBUILD; ALTER INDEX measures_index_topology ON Measures REBUILD; ALTER INDEX measures_index_date ON Measures REBUILD; 

Where am I mistaken? And why is my recovery index not working? Best wishes

0
source share
1 answer

It looks like Tez might have a problem creating an index on an empty table. I was able to get the same error as you (without using JSON SerDe), and if you look at the application logs for the DAG that will fail, you can see something like:

 java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:299) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getSplits(TezGroupedSplitsInputFormat.java:68) at org.apache.tez.mapreduce.hadoop.MRHelpers.generateOldSplits(MRHelpers.java:263) at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:139) at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:154) at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:146) ... 

If you populate a table with one dummy entry, it works fine. I used:

 INSERT INTO TABLE Measures SELECT market,0,0 FROM hivesampletable limit 1; 

After that, the rebuild index was able to work without errors.

0
source

All Articles