But is there a way to prevent the Row object from arranging them?
No If you provide kwargs arguments will be sorted by name . Sorting is necessary for deterministic behavior since Python prior to 3.6 does not preserve the order of the keyword arguments.
Just use simple tuples:
rdd = sc.parallelize([(1, 2)])
and pass the circuit as an argument to RDD.toDF (not to be confused with DataFrame.toDF ):
rdd.toDF(["foo", "bar"])
or createDataFrame :
from pyspark.sql.types import * spark.createDataFrame(rdd, ["foo", "bar"])
You can also use namedtuples :
from collections import namedtuple FooBar = namedtuple("FooBar", ["foo", "bar"]) spark.createDataFrame([FooBar(foo=1, bar=2)])
Finally, you can sort the columns by select :
sc.parallelize([Row(foo=1, bar=2)]).toDF().select("foo", "bar")
source share