Parquet with spring-loaded table is damaged - HIVE is the expected magic number in the tail [80, 65, 82, 49], but found [1, 92, 78, 10]

Distribution: CDH-4.6.0-1.cdh4.6.0.p0.26 Printable version: 0.10.0 Parquet Version: 1.2.5

I have two large tables with date-partitioned sections filled with log files that I recently converted to Parquet to use compression and column storage. So far I have been very pleased with the performance.

Our development team recently added a field to the logs, so I was charged with adding a column to both log tables. It worked great for one, but the other seemed to go bad. I returned the change, but I still cannot query the table.

I'm sure the data is fine (because it hasn't changed), but something is wrong in the metastore. The msck repair table re-populates the sections after I drop / create, but will not take care of the error below. There are two things that can fix this, but none of them make me happy:

  • Paste the data.
  • Copy data back to the table from the production cluster.

I really hope that a team that I don’t know about will fix the table without resorting to the two parameters above. As I said, the data is fine. I was looking for an error, and I got some results, but they all refer to Impala, which is NOT used.

select * from upload_metrics_hist where dt = '2014-07-01' limit 5; 

The problem is this:

Called: java.lang.RuntimeException: hdfs: // hdfs-dev / data / prod / upload-metrics / upload_metrics_hist / dt = 2014-07-01 / 000005_0 is not a parquet file. expected magic number in the tail [80, 65, 82, 49], but found [1, 92, 78, 10]

Complete mistake

 2014-07-17 02:00:48,835 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:372) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:319) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:433) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:358) ... 10 more Caused by: java.lang.RuntimeException: hdfs://hdfs-dev/data/prod/upload-metrics/upload_metrics_hist/dt=2014-07-01/000005_0 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [1, 92, 78, 10] at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:263) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:229) at parquet.hive.DeprecatedParquetInputFormat$RecordReaderWrapper.getSplit(DeprecatedParquetInputFormat.java:327) at parquet.hive.DeprecatedParquetInputFormat$RecordReaderWrapper.<init>(DeprecatedParquetInputFormat.java:204) at parquet.hive.DeprecatedParquetInputFormat.getRecordReader(DeprecatedParquetInputFormat.java:108) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) ... 15 more 
+7
hive hdfs parquet
source share

No one has answered this question yet.

See related questions:

3
Sharing "add section"
3
Set file size of parquet file - hive?
one
Hive request error with Json
one
Downloading data from SQL Server to S3 as parquet - AWS EMR
one
Partition data recovery in hdfs
one
Hive Serde errors with the <Struct <>> org.json.JSONArray array cannot be passed to [Ljava.lang.Object;
one
the changing column of the <struct <>> types of the parquet table leads to an error in the hive
0
Oozie Hive Workflow Interpretation
0
Sqoop Export Hive String for Oracle CLOB
0
Error "Unable to create an empty row from an empty row" in an external bus table when executing selection queries

All Articles