First take the column names with df.columns , then filter to the desired column names you want .filter(_.startsWith("colF")) . This gives you an array of strings. But the choice takes select(String, String*) . Fortunately, there will be select(Column*) for the columns, so finally convert the rows to columns with .map(df(_)) and finally turn Array of Columns into var arg with : _* .
df.select(df.columns.filter(_.startsWith("colF")).map(df(_)) : _*).show
This filter can be made more complex (just like Pandas). This, however, is a rather ugly solution (IMO):
df.select(df.columns.filter(x => (x.equals("colA") || x.startsWith("colF"))).map(df(_)) : _*).show
If the list of other columns is corrected, you can also combine a fixed array of column names with a filtered array.
df.select((Array("colA", "colB") ++ df.columns.filter(_.startsWith("colF"))).map(df(_)) : _*).show
source share