Json exception fix

I am trying to catch / ignore a parsing error when I read a json file

val DF = sqlContext.jsonFile("file") 

There are several lines that are not valid json objects, but the data is too large to pass individually (~ 1 TB)

I came across exception handling for matching using import scala.util.Try and in.map(a => Try(a.toInt)) links: how to handle the Exception function in spark map ()?

How can I catch an exception while reading a json file using sqlContext.jsonFile function?

Thanks!

+4
source share
1 answer

Unfortunately, you are out of luck here. DataFrameReader.json that is used under the hood is pretty much all or nothing. If your input contains invalid lines, you need to filter them manually. A basic solution might look like this:

 import scala.util.parsing.json._ val df = sqlContext.read.json( sc.textFile("file").filter(JSON.parseFull(_).isDefined) ) 

Since the validation above is quite expensive, you can completely abandon jsonFile / read.json and use the parsed JSON strings directly.

+2
source

All Articles