How to convert json string in dataframe to spark

I want to convert the string variable below in a dataframe to a spark.

val jsonStr = "{ "metadata": { "key": 84896, "value": 54 }}" 

I know how to create a dataframe from a json file.

 sqlContext.read.json("file.json") 

but I don't know how to create a dataframe from a string variable.

How to convert json String variable to dataframe.

+15
json scala dataframe apache-spark
source share
6 answers

For Spark 2.2+ :

 import spark.implicits._ val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}""" val df = spark.read.json(Seq(jsonStr).toDS) 

For Spark 2.1.x :

 val events = sc.parallelize("""{"action":"create","timestamp":"2016-01-07T00:01:17Z"}""" :: Nil) val df = sqlContext.read.json(events) 

Hint: this is using sqlContext.read.json(jsonRDD: RDD[Stirng]) . There is also sqlContext.read.json(path: String) where it reads the Json file directly.

For older versions :

 val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}""" val rdd = sc.parallelize(Seq(jsonStr)) val df = sqlContext.read.json(rdd) 
+38
source share

Since the JSON reader function from RDD is deprecated in Spark 2.2, this will be another option:

 val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}""" import spark.implicits._ // spark is your SparkSession object val df = spark.read.json(Seq(jsonStr).toDS) 
+16
source share

To convert json string list to DataFrame in Spark 2.2 =>

 val spark = SparkSession .builder() .master("local") .appName("Test") .getOrCreate() var strList = List.empty[String] var jsonString1 = """{"ID" : "111","NAME":"Arkay","LOC":"Pune"}""" var jsonString2 = """{"ID" : "222","NAME":"DineshS","LOC":"PCMC"}""" strList = strList :+ jsonString1 strList = strList :+ jsonString2 val rddData = spark.sparkContext.parallelize(strList) resultDF = spark.read.json(rddData) resultDF.show() 

Result:

 +---+----+-------+ | ID| LOC| NAME| +---+----+-------+ |111|Pune| Arkay| |222|PCMC|DineshS| +---+----+-------+ 
+2
source share

Here is an example of how to convert a Json string to a Dataframe in Java (Spark 2.2+):

 String str1 = "{\"_id\":\"123\",\"ITEM\":\"Item 1\",\"CUSTOMER\":\"Billy\",\"AMOUNT\":285.2}"; String str2 = "{\"_id\":\"124\",\"ITEM\":\"Item 2\",\"CUSTOMER\":\"Sam\",\"AMOUNT\":245.85}"; List<String> jsonList = new ArrayList<>(); jsonList.add(str1); jsonList.add(str2); SparkContext sparkContext = new SparkContext(new SparkConf() .setAppName("myApp").setMaster("local")); JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext); SQLContext sqlContext = new SQLContext(sparkContext); JavaRDD<String> javaRdd = javaSparkContext.parallelize(jsonList); Dataset<Row> data = sqlContext.read().json(javaRdd); data.show(); 

Here is the result:

 +------+--------+------+---+ |AMOUNT|CUSTOMER| ITEM|_id| +------+--------+------+---+ | 285.2| Billy|Item 1|123| |245.85| Sam|Item 2|124| +------+--------+------+---+ 
+2
source share
 simple_json = '{"results":[{"a":1,"b":2,"c":"name"},{"a":2,"b":5,"c":"foo"}]}' rddjson = sc.parallelize([simple_json]) df = sqlContext.read.json(rddjson) 

Link to answer: stack overflow

+1
source share

Now you can directly read json from the [String] dataset: https://spark.apache.org/docs/latest/sql-data-sources-json.html

 val otherPeopleDataset = spark.createDataset( """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil) val otherPeople = spark.read.json(otherPeopleDataset) otherPeople.show() // +---------------+----+ // | address|name| // +---------------+----+ // |[Columbus,Ohio]| Yin| // +---------------+----+ 
0
source share

All Articles