You can simply foldLeft over Array columns:
val transformed: DataFrame = df.columns.foldLeft(df)((df, arg) => str(arg, df))
However, I will argue that this is not a very good approach. Since src discards StringIndexerModel , it cannot be used when you get new data. Because of this, I would recommend using Pipeline :
import org.apache.spark.ml.Pipeline val transformers: Array[org.apache.spark.ml.PipelineStage] = df.columns.map( cname => new StringIndexer() .setInputCol(cname) .setOutputCol(s"${cname}_index") )
VectorAssembler can be enabled as follows:
val assembler = new VectorAssembler() .setInputCols(df.columns.map(cname => s"${cname}_index")) .setOutputCol("features") val stages = transformers :+ assembler
You can also use RFormula , which is less customizable but much more concise:
import org.apache.spark.ml.feature.RFormula val rf = new RFormula().setFormula(" ~ uuid + url + browser - 1") val rfModel = rf.fit(dataset) rfModel.transform(dataset)
source share