I already asked this question earlier, but did not receive an answer ( I could not connect to postgres using jdbc in the pyspark shell ).
I successfully installed Spark 1.3.0 in my local windows and ran test programs for testing using the pyspark shell.
Now I want to run Correlations from Mllib on the data stored in Postgresql, but I can not connect to postgresql.
I successfully added the required jar (tested this jar) in the class path by running
pyspark --jars "C:\path\to\jar\postgresql-9.2-1002.jdbc3.jar"
I see that jar has been successfully added to the environment user interface.
When I ran the following in the pyspark shell -
from pyspark.sql import SQLContext sqlContext = SQLContext(sc) df = sqlContext.load(source="jdbc",url="jdbc:postgresql://[host]/[dbname]", dbtable="[schema.table]")
I get this ERROR -
>>> df = sqlContext.load(source="jdbc",url="jdbc:postgresql://[host]/[dbname]", dbtable="[schema.table]") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\pyspark\sql\context.py", line 482, in load df = self._ssql_ctx.load(source, joptions) File "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py", line 538, in __call__ File "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o20.load. : java.sql.SQLException: No suitable driver found for jdbc:postgresql://[host]/[dbname] at java.sql.DriverManager.getConnection(DriverManager.java:602) at java.sql.DriverManager.getConnection(DriverManager.java:207) at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:94) at org.apache.spark.sql.jdbc.JDBCRelation.<init> (JDBCRelation.scala:125) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:114) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:667) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:619)
postgresql jdbc apache-spark apache-spark-sql
Soni shashank
source share