Failed to load class for data source: com.databricks.spark.csv

My build.sbt file has the following:

 scalaVersion := "2.10.3" libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.1.0" 

I run Spark in an offline cluster mode, and SparkConf is SparkConf().setMaster("spark://ec2-[ip].compute-1.amazonaws.com:7077").setAppName("Simple Application") (I don't using the setJars method, not sure if I need this).

I will pack the jar using the sbt package command. The command I use to start the application is ./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar .

When doing this, I get this error:

java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv

What is the problem?

+6
source share
6 answers

Use dependencies accordingly. For instance:

 <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>com.databricks</groupId> <artifactId>spark-csv_2.10</artifactId> <version>1.4.0</version> </dependency> 
+3
source

Include this option: -packages com.databricks: spark-csv_2.10: 1.2.0, but do this after --class and before the target /

+1
source

add the --jars option and load banks from the repository such as search.maven.org

 --jars commons-csv-1.1.jar,spark-csv-csv.jar,univocity-parsers-1.5.1.jar \ 

Use the --packages , as claudiaann1 also offers to work if you have Internet access without a proxy. If you need to go through a proxy server, this will not work.

0
source

Here is an example that worked: spark-submit --jars file: /root/Downloads/jars/spark-csv_2.10-1.0.3.jar,file: / root / Downloads / jars / com Mons-csv-1.2.jar , file: /root/Downloads/jars/spark-sql_2.11-1.4.1.jar --class "SampleApp" --master local [2] target / scala -2.11 / my-proj_2.11-1.0.jar

0
source

Use the below command, its work:

ubuntu> spark-submit --class ur_class_name --master local [*] - com.databricks packages: spark-csv_2.10: 1.4.0 project_path / target / scala - 2.10 / jar_name.jar

0
source

Have you tried using the --package argument with the spark-submit function? I ran into this problem when the spark did not comply with the dependencies listed as libraryDependencies.

Try the following:

 ./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar --packages com.databricks:spark-csv_2.10:1.1.0 

_

From the Spark docs:

Users can also include any other dependencies by providing a comma-separated list of maven codes --packages. All transitive dependencies will be processed using this command.

https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

-2
source

All Articles