Local spark talking to remote HDFS?

I have a file in HDFS inside my VirtualBox HortonWorks HDP 2.3_1 virtual machine.

If I go to the guest spark shell and referring to the file, it works fine

val words = sc.textFile ("hdfs: ///tmp/people.txt") words.count

However, if I try to access it from a local Spark application on my Windows host, it does not work

val conf = new SparkConf().setMaster("local").setAppName("My App") val sc = new SparkContext(conf) val words=sc.textFile("hdfs://localhost:8020/tmp/people.txt") words.count 

Gives out

An exception in the stream "main" org.apache.spark.SparkException: The lawsuit is interrupted due to the failure of the stage: Task 0 at stage 0.0 failed 1 time, last failure: Lost task 0.0 at stage 0.0 (TID 0, localhost): org. apache.hadoop.hdfs.BlockMissingException: Failed to get the block: BP-452094660-10.0.2.15-1437494483194: blk_1073742905_2098 file = / tmp / people.txt at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode (DFSInputSt38.chooseDataNode (DFSInput3838: 38) ) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo (DFSInputStream.java►26)

Port 8020 is open, and if I select the wrong file name, it will tell me

 Input path does not exist: hdfs://localhost:8020/tmp/people.txt!! 

localhost: 8020 must be correct as a guest HDP VM hat NAT port tunneling in a Windows box.

And I say that if I give him the wrong name, I get the corresponding exception

My pom has

 <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>1.4.1</version> <scope>provided</scope> </dependency> 

Am I doing something wrong? And what is a BlockMissingException trying to tell me?

+6
source share

All Articles