I want to create a Parquet file with unknown format data at compile time. I get the schema as text later, and I know that some columns contain a date with time. I want to do this using Spark and Java. So I followed http://spark.apache.org/docs/1.2.1/sql-programming-guide.html#programmatically-specifying-the-schema and created a schema with the appropriate types. I tried to use Spark DataType.TimestampTypeand DataType.DateTypecolumns date of such columns. But none of them work. When I try to save the file using JavaSchemaRDD.saveAsParquetFile, I get an error Unsupported datatype+ the type I tried for the date. I tried this with help emptyRDD, so no data conversion problems arise.
DataType.TimestampType
DataType.DateType
JavaSchemaRDD.saveAsParquetFile
Unsupported datatype
emptyRDD
Studying: http://parquet.incubator.apache.org/documentation/latest/ and https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md I think I need to convert the data to some integer / long type and inform that they represent Date. If so, how can I do this in Spark? Or maybe I need to do something else?
. , DateType Timestamp Parquet Spark 1.3 ( https://github.com/apache/spark/pull/3820 https://issues.apache.org/jira/browse/SPARK-4709).
Spark INT96 Timestamp ( , Impala).