Spark sql provides various dataframe functions such as avg, mean, sum, etc.
you just need to apply to the dataframe column using fix sql column
import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.Column
Create a private method for standard deviation
private def stddev(col: Column): Column = sqrt(avg(col * col) - avg(col) * avg(col))
Now you can create sql Column for mean and standard deviation
val value_sd: org.apache.spark.sql.Column = stddev(df.col("value")).as("value_sd") val value_mean: org.apache.spark.sql.Column = avg(df.col("value").as("value_mean"))
Filter your data framework in the last 10 days or whatever you want.
val filterDF=df.filter("")
Yon can now apply the aggregate function on your filterDF
filterDF.agg(stdv, value_mean).show
Amit dubey
source share