In the code for socketTextStream Spark creates an instance of SocketInputDStream that uses java.net.Socket https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ SocketInputDStream.scala # L73
java.net.Socket is a client socket, which means that it expects that the specified address and port are already running on the server. If you do not have a service running on a server on port 7777 of your local computer, the error you see is expected.
To understand what I mean, try the following (you may not need to install master or appName in your environment).
import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.StreamingContext import org.apache.spark.SparkConf object MyStream { def main(args:Array[String]) { val sc = new StreamingContext(new SparkConf().setMaster("local").setAppName("socketstream"),Seconds(10)) val mystreamRDD = sc.socketTextStream("bbc.co.uk",80) mystreamRDD.print() sc.start() sc.awaitTermination() } }
This does not return any content because the application does not say HTTP to the bbc website, but does not receive a connection rejection.
To start a local server on Linux, I would use netcat with a simple command like
cat data.txt | ncat -l -p 7777
I am not sure what your best approach is on Windows. You can write another application that listens as a server on this port and sends some data.
source share