Streaming JSON Data as Parquet in S3

Question

Streaming JSON Data as Parquet in S3

I have a Kinesis stream creating JSON and I wanted to use Storm to write to S3 in Parquet format. This approach will require conversion from JSON → Avro → Parquet during stream processing. In addition, I need to deal with the evolution of the circuit in this approach and continue to update the avro classes and the avsc generated java classes.

Another option is to write JSON in S3 directly and use Spark to convert saved files to parquet. Spark can take care of the evolution of the circuit in this case.

I would like to get the pros and cons of both approaches. Also, is there any other better approach that may concern the evolution of the circuit in the json -> avro -> parquet conversion pipeline?

+5

json amazon-kinesis avro parquet

user1064949 Aug 20 '15 at 7:07

source share

No one has answered this question yet.

See related questions:

9653

What is the correct JSON content type?

6956

Can comments be used in JSON?

3915

Why does Google add while (1); in your JSON answers?