Spark SQL - Problems with PostgreSQL JDBC Classes

I have a problem connecting Spark SQL to a PostgreSQL data source. Ive downloaded the Postgres JDBC jar and included it in the uber jar using sbt-assembly.

My (unsuccessful) source code: https://gist.github.com/geowa4/a9bc238ca7c372b95267 .

Ive also tried using sqlContext.jdbc() , which is preceded by classOf[org.postgresql.Driver] . It looks like the driver can access the driver just fine.

Any help would be greatly appreciated. Thanks.

SimpleApp.scala:

 import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.sql.SQLContext object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) import sqlContext.implicits._ val commits = sqlContext.load("jdbc", Map( "url" -> "jdbc:postgresql://192.168.59.103:5432/postgres", "dbtable" -> "commits", "driver" -> "org.postgresql.Driver")) commits.select("message").show(1) } } 

simple.sbt:

 name := "simple-project" version := "1.0" scalaVersion := "2.11.6" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1" % "provided" libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1" % "provided" libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41" 

output (Edited):

 Exception in thread "main" java.lang.ClassNotFoundException: org.postgresql.Driver at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:102) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) at SimpleApp$.main(SimpleApp.scala:17) at SimpleApp.main(SimpleApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

EDIT: I changed the version of Scala to 2.10.5 and the result changed to this. I feel like I'm making progress.

+5
source share
2 answers

There is a problem with a common problem with JDBC where the primitive classloader needs to know about the bank. In Spark 1.3, this can be solved with the SPARK_CLASSPATH option, as described here: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases

In Spark 1.4, this should be fixed # 5782 .

+5
source

1) Copy the file to your flag location

2) Add jar to the path as follows

 spark-submit --jars /usr/share/java/postgresql-jdbc.jar --class com.examples.WordCount .. .. .. 
0
source