Spark> = 2.4
Because Spark 2.4 provides format-independent recording interfaces, and some models already implement them. For example, LinearRegressionModel :
val lrm: org.apache.spark.ml.regression.LinearRegressionModel = ??? val path: String = ??? lrm.write.format("pmml").save(path)
will create a directory with a single file containing the PMML view.
Spark <2.4
What is the format of these files?
data/*.parquet files are in Apache Parquet storage format in columnsmetadata/part-* looks like JSON
Which file (s) contains the current model?
Is it possible to save the model in another place, for example, in the database?
I do not know any direct method, but you can load the model as a data frame and subsequently save it in the database:
val modelDf = spark.read.parquet("/path/to/data/") modelDf.write.jdbc(...)
source share