I installed Spark on a Windows machine and want to use it through Spyder. After some troubleshooting, it seems to work:
import os os.environ["SPARK_HOME"] = "D:\Analytics\Spark\spark-1.4.0-bin-hadoop2.6" from pyspark import SparkContext, SparkConf from pyspark.sql import SQLContext spark_config = SparkConf().setMaster("local[8]") sc = SparkContext(conf=spark_config) sqlContext = SQLContext(sc) textFile = sc.textFile("D:\\Analytics\\Spark\\spark-1.4.0-bin-hadoop2.6\\README.md") textFile.count() textFile.filter(lambda line: "Spark" in line).count() sc.stop()
This works as expected. Now I want to connect to the Postgres9.3 database running on the same server. I downloaded the JDBC driver here here and put it in the D: \ Analytics \ Spark \ spark_jars folder. Then I created a new file D: \ Analytics \ Spark \ spark-1.4.0-bin-hadoop2.6 \ conf \ spark-defaults.conf containing this line:
spark.driver.extraClassPath 'D:\\Analytics\\Spark\\spark_jars\\postgresql-9.3-1103.jdbc41.jar'
I checked the following code to verify the connection
import os os.environ["SPARK_HOME"] = "D:\Analytics\Spark\spark-1.4.0-bin-hadoop2.6" from pyspark import SparkContext, SparkConf from pyspark.sql import SQLContext spark_config = SparkConf().setMaster("local[8]") sc = SparkContext(conf=spark_config) sqlContext = SQLContext(sc) df = (sqlContext .load(source="jdbc", url="jdbc:postgresql://[hostname]/[database]?user=[username]&password=[password]", dbtable="pubs") ) sc.stop()
But I get the following error:
Py4JJavaError: An error occurred while calling o22.load. : java.sql.SQLException: No suitable driver found for jdbc:postgresql://uklonana01/stonegate?user=analytics&password=pMOe8jyd at java.sql.DriverManager.getConnection(Unknown Source) at java.sql.DriverManager.getConnection(Unknown Source) at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:118) at org.apache.spark.sql.jdbc.JDBCRelation.<init>(JDBCRelation.scala:128) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:113) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Unknown Source)
How can I check if I downloaded the correct .jar file or where else could an error occur?
source share