Weighted value of parquet flooring

I have a parquet file. I booted using Spark.And one of the values ​​is a subkey, a pair of values. How to smooth?

df.printSchema root |-- location: string (nullable = true) |-- properties: string (nullable = true) texas,{"key":{"key1":"value1","key2":"value2"}} 

thanks,

0
source share
1 answer

You can use explode on your data frame and pass it a function that reads the JSON column using scala4s. Scala4s has a simple analysis interface, for your case it will look like this:

 val list = for { JArray(keys) <- parse(json) \\ "key" json @ JObject(key) <- keys JField("key1", JString(key1)) <- key JField("key2", JString(key2)) <- key } yield { Seq(key1, key2) } 

This aligns your frame.

If you also want to add a column for the key, you can use withColumn after the explosion (save the key also in a new column).

+1
source

All Articles