Unfortunately, you are out of luck here. DataFrameReader.json that is used under the hood is pretty much all or nothing. If your input contains invalid lines, you need to filter them manually. A basic solution might look like this:
import scala.util.parsing.json._ val df = sqlContext.read.json( sc.textFile("file").filter(JSON.parseFull(_).isDefined) )
Since the validation above is quite expensive, you can completely abandon jsonFile / read.json and use the parsed JSON strings directly.
source share