I work with data extracted from SFDC using the simple-salesforce package. I am using Python3 for scripting and Spark 1.5.2.
I created rdd containing the following data:
[('Id', 'a0w1a0000003xB1A'), ('PackSize', 1.0), ('Name', 'A')] [('Id', 'a0w1a0000003xAAI'), ('PackSize', 1.0), ('Name', 'B')] [('Id', 'a0w1a00000xB3AAI'), ('PackSize', 30.0), ('Name', 'C')] ...
This data is in an RDD called v_rdd
My diagram looks like this:
StructType(List(StructField(Id,StringType,true),StructField(PackSize,StringType,true),StructField(Name,StringType,true)))
I am trying to create a DataFrame from this RDD:
sqlDataFrame = sqlContext.createDataFrame(v_rdd, schema)
I print my DataFrame:
sqlDataFrame.printSchema()
And get the following:
+--------------------+--------------------+--------------------+ | Id| PackSize| Name| +--------------------+--------------------+--------------------+ |[Ljava.lang.Objec...|[Ljava.lang.Objec...|[Ljava.lang.Objec...| |[Ljava.lang.Objec...|[Ljava.lang.Objec...|[Ljava.lang.Objec...| |[Ljava.lang.Objec...|[Ljava.lang.Objec...|[Ljava.lang.Objec...|
I expect to see actual data, for example:
+------------------+------------------+--------------------+ | Id|PackSize| Name| +------------------+------------------+--------------------+ |a0w1a0000003xB1A | 1.0| A | |a0w1a0000003xAAI | 1.0| B | |a0w1a00000xB3AAI | 30.0| C |
Could you help me determine what I am doing wrong here.
My Python script is long, I'm not sure it will be convenient for people to sift it, so I posted only the parts that I have a problem with.
Thanks for the ton in advance!