Save date and time in Spark Parquet

Question

Save date and time in Spark Parquet

I want to create a Parquet file with unknown format data at compile time. I get the schema as text later, and I know that some columns contain a date with time. I want to do this using Spark and Java. So I followed http://spark.apache.org/docs/1.2.1/sql-programming-guide.html#programmatically-specifying-the-schema and created a schema with the appropriate types. I tried to use Spark DataType.TimestampTypeand DataType.DateTypecolumns date of such columns. But none of them work. When I try to save the file using JavaSchemaRDD.saveAsParquetFile, I get an error Unsupported datatype+ the type I tried for the date. I tried this with help emptyRDD, so no data conversion problems arise.

Studying: http://parquet.incubator.apache.org/documentation/latest/ and https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md I think I need to convert the data to some integer / long type and inform that they represent Date. If so, how can I do this in Spark? Or maybe I need to do something else?

+4

java apache-spark apache-spark-sql parquet

kkonrad Feb 20 '15 at 18:09

source share

1 answer

pfc · Accepted Answer · 2015-02-22T16:02:26+0000

. , DateType Timestamp Parquet Spark 1.3 ( https://github.com/apache/spark/pull/3820 https://issues.apache.org/jira/browse/SPARK-4709).

Spark INT96 Timestamp ( , Impala).

Save date and time in Spark Parquet

More articles: