Streaming JSON Data as Parquet in S3

I have a Kinesis stream creating JSON and I wanted to use Storm to write to S3 in Parquet format. This approach will require conversion from JSON β†’ Avro β†’ Parquet during stream processing. In addition, I need to deal with the evolution of the circuit in this approach and continue to update the avro classes and the avsc generated java classes.

Another option is to write JSON in S3 directly and use Spark to convert saved files to parquet. Spark can take care of the evolution of the circuit in this case.

I would like to get the pros and cons of both approaches. Also, is there any other better approach that may concern the evolution of the circuit in the json -> avro -> parquet conversion pipeline?

+5
source share

All Articles