Recording parquet format in HDFS using Java API without using Avro and MR

What is an easy way to write Parquet Format in HDFS (using the Java API) by directly creating a parquet diagram from Pojo, without using avro and MR ?

The samples I found are outdated and use outdated methods, also uses one of Avro, Spark or MR.

+2
source share
1 answer

In fact, there are not many samples to read / write Apache parquet files without using an external infrastructure.

The main library of parquet is a parquet column where you can directly find the reading / writing of test files: https://github.com/apache/parquet-mr/blob/master/parquet-column/src/test/java/org/ apache / parquet / io / TestColumnIO.java

Then you just need to use the same functions with the HDFS file. You can complete this SOW question for this: Accessing files in HDFS using Java

UPDATED: to answer the deprecated parts of the API: AvroWriteSupport should be replaced with AvroParquetWriter, and I check that ParquetWriter is not outdated and can be used safely.

Hi,

Loic

+4
source

All Articles