NullPointerException after retrieving a Teradata table using Scala / Spark

Question

NullPointerException after retrieving a Teradata table using Scala / Spark

I need to extract a table from Teradata (read-only access) to the parquet using Scala (2.11) / Spark (2.1.0). I am creating a DataFrame which I can load successfully

val df = spark.read.format("jdbc").options(options).load()

But df.show gives me a NullPointerException:

 java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)

I did a df.printSchema and I found out that the reason for this NPE is that the dataset contains null values for the columns (nullable = false) (it looks like Teradata is giving me the wrong information). Indeed, I can achieve df.show if I omit the problematic columns.

So, I tried to specify a new schema with all columns set to (nullable = true) :

 val new_schema = StructType(df.schema.map { case StructField(n,d,nu,m) => StructField(n,d,true,m) }) val new_df = spark.read.format("jdbc").schema(new_schema).options(options).load()

But then I got:

 org.apache.spark.sql.AnalysisException: JDBC does not allow user-specified schemas.;

I also tried to create a new Dataframe from the previous one, specifying the desired schema:

 val new_df = df.sqlContext.createDataFrame(df.rdd, new_schema)

But I still have NPE when you take action on data.

Any idea on how I can fix this?

+9

scala dataframe teradata apache-spark apache-spark-sql

Raphdg Aug 29 '17 at 8:07

source share

3 answers

User4567 · Answer 1 · 2018-02-23T10:00:14+0000

I think this is allowed in the databases of the latest teradata version. After all the research, I updated the teradata bans (terajdbc4.jar and tdgssconfig.jar) to 16.20.00.04 and changed the Teradata url to

 teradata.connection.url=jdbc:teradata://hostname.some.com/ TMODE=ANSI,CHARSET=UTF8,TYPE=FASTEXPORT,COLUMN_NAME=ON,MAYBENULL=ON

this works after i added teradta url properties COLUMN_NAME = ON, MAYBENULL = ON

Now everything is working fine.

you can check the reference document here

https://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#2403_2403ch022113

Allen · Answer 2 · 2018-12-06T20:17:43+0000

I have the same problem, are there any solutions?

Abhijeet · Answer 3 · 2019-07-18T14:03:10+0000

I ran into the same problem and tried using the latest jar file, i.e. 16.20.00.12, and it still does not work for me. Could you share your JAR file?

NullPointerException after retrieving a Teradata table using Scala / Spark

More articles: