Best Practices for Specific Data Types at Avro

I'm curious to understand best practices for coding two very specific data types in Avro: timestamps and IP addresses.

I came across a JIRA open ticket for Timestamps ( https://issues.apache.org/jira/browse/AVRO-739 ), but it seems like the topic has been calm for some time. So - What are the best methods for coding timestamps in Avro (preferably for use in a stream in the context of MapReduce, Pig, Hive, Streaming).

In addition, I would be interested to know what other people do to encode IP addresses in Avro.

+6
source share
1 answer

I have some experience with type coding in Avro. In my case, accessing data through Hive is a big requirement.

  • For timestamps, I would recommend using float with unix timestamps. This is supported by most other libraries and makes it easy to work with Hive, as you can relate it to a timestamp.

  • For IP addresses, I would use string encoding. I think that readability of strings when using data makes it a better type. If you have other requirements, such as saving data size, maybe binary encoding might be better for you.

+1
source

All Articles