Download com.databricks.spark.csv via RStudio

I installed Spark-1.4.0. I also installed my R SparkR package, and I can use it through Spark-shell and through RStudio, however there is one difference that I cannot solve.

When launching the SparkR shell

./bin/sparkR --master local[7] --packages com.databricks:spark-csv_2.10:1.0.3

I can read the CSV file as follows

flights <- read.df(sqlContext, "data/nycflights13.csv", "com.databricks.spark.csv", header="true")

Unfortunately, when I run SparkR through RStudio (setting my SPARK_HOME correctly), I get the following error message:

15/06/16 16:18:58 ERROR RBackendHandler: load on 1 failed
Caused by: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv

I know I have to load com.databricks: spark-csv_2.10: 1.0.3 in a way, but I have no idea how to do this. Can someone help me?

+4
source share
4 answers

( ): (. . )

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')

library(SparkR)
library(magrittr)

# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-Flights-example")
sqlContext <- sparkRSQL.init(sc)


# The SparkSQL context should already be created for you as sqlContext
sqlContext
# Java ref type org.apache.spark.sql.SQLContext id 1

# Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here.
flights <- read.df(sqlContext, "nycflights13.csv", "com.databricks.spark.csv", header="true")
+3

. sparkContext :

sc <- sparkR.init(appName="SparkR-Example",sparkEnvir=list(spark.executor.memory="1g"),sparkJars="spark-csv-assembly-1.1.0.jar")

, , spark-csv_2.11-1.0.3.jar. sparkJars, , , . , , . CSV :

flights <- read.df(sqlContext, "data/nycflights13.csv","com.databricks.spark.csv",header="true")
+2

Spark-1.4.0, Spark-1.4.0/R, SparkR, pkg, :

R CMD build --resave-data pkg

This gives you a .tar file that you can install in RStudio (using devtools you can also install the package in pkg). In RStudio, you should set your path to Spark as follows:

Sys.setenv(SPARK_HOME="path_to_spark/spark-1.4.0")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)

And you have to be ready for work. I can only speak from the Mac experience, hope this helps?

0
source

If after you tried the Pragith solution above, and you still have a problem. It is very possible that the csv file you want to download is not in the current RStudio working directory. Use getwd () to check the RStudio directory and verify that the csv file exists.

0
source

All Articles