How to read a partitioned parquet with a condition as a dataframe,
it works great
val dataframe = sqlContext.read.parquet("file:///home/msoproj/dev_data/dev_output/aln/partitions/data=jDD/year=2015/month=10/day=25/*")
There is a section for day=1 to day=30 , is it possible to read something like (day = 5 to 6) or day=5,day=6 ,
val dataframe = sqlContext.read.parquet("file:///home/msoproj/dev_data/dev_output/aln/partitions/data=jDD/year=2015/month=10/day=??/*")
If I put * , he gave me all the data for 30 days, and it is too big.
scala apache-spark spark-dataframe parquet
Woodhophopper
source share