I am new to Hive and ElasticMapreduce and I am currently facing a specific issue. When a Hive statement is run on a table with billions of rows of JSON objects, the MapReduce job fails as soon as one of these rows is invalid / invalid. JSON
An exception:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"39488130","cdate":"2012-08-09","cdate_ts":"2012-08-09 17:06:41","country":"SA","city":"Riyadh","mid":"6666276582211270592","osversion":"5.1. 1 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"39488130","cdate":"2012-08-09","cdate_ts":"2012-08-09 17:06:41","country":"SA","city":"Riyadh","mid":"6666276582211270592","osversion":"5.1.1 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 8 more Caused by: com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Unterminated string near at com.google.gson.Streams.parse(Streams.java:51) at com.google.gson.JsonParser.parse(JsonParser.java:83) at com.google.gson.JsonParser.parse(JsonParser.java:58) at com.google.gson.JsonParser.parse(JsonParser.java:44) at com.amazon.elasticmapreduce.JsonSerde.deserialize(Unknown Source) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:510) ... 9 more Caused by: com.google.gson.stream.MalformedJsonException: Unterminated string near at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1110) at com.google.gson.stream.JsonReader.nextString(JsonReader.java:967) at com.google.gson.stream.JsonReader.nextValue(JsonReader.java:802) at com.google.gson.stream.JsonReader.objectValue(JsonReader.java:782) at com.google.gson.stream.JsonReader.quickPeek(JsonReader.java:377) at com.google.gson.stream.JsonReader.peek(JsonReader.java:340) at com.google.gson.Streams.parseRecursive(Streams.java:60) at com.google.gson.Streams.parseRecursive(Streams.java:83) at com.google.gson.Streams.parse(Streams.java:40) ... 14 more
I create my tables as follows:
CREATE EXTERNAL TABLE IF NOT EXISTS table1 ( column1 string, column2 string ) PARTITIONED BY (year string, month string) ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde' WITH SERDEPROPERTIES ('paths'='c1, c2') LOCATION 's3://mybucket/table1';
What can I do to prevent a crash? Ignoring invalid JSON objects / lines would be fine, as its one of the billions that has been garbled.
Thanks for your help in advance. Best, Sasha
source share