Why is the Spark Streaming app working fine using sbt run but not working on Tomcat (like a web application)?

I have a Spala application in Scala that grabs records from Kafka every 10 seconds and saves them as files. This is an SBT project, and I launch my application using the sbt run command. Everything works fine until I put my application on Tomcat. I managed to create a WAR file with this plugin , but it looks like my application does nothing when deployed to Tomcat.
This is my code:

 object SparkConsumer { def main (args: Array[String]) { val conf = new SparkConf().setMaster("local[*]").setAppName("KafkaReceiver") val ssc = new StreamingContext(conf, Seconds(10)) val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "localhost:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "group_id", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean) ) val topics = Array("mytopic") val stream = KafkaUtils.createDirectStream[String, String]( ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams) ) stream.map(record => (record.key, record.value)).print val arr = new ArrayBuffer[String](); val lines = stream.map(record => (record.key, record.value)); stream.foreachRDD { rdd => if (rdd.count() > 0 ) { val date = System.currentTimeMillis() rdd.saveAsTextFile ("/tmp/sparkout/mytopic/" + date.toString) rdd.foreach { record => println("t=" + record.topic + " m=" + record.toString()) } } println("Stream had " + rdd.count() + " messages") val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges rdd.foreachPartition { iter => val o: OffsetRange = offsetRanges(TaskContext.get.partitionId) println(s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}") println(o) } } stream.saveAsTextFiles("/tmp/output") ssc.start() ssc.awaitTermination() } } 

It is strange that the application works completely fine when launched using the sbt run command. It correctly reads records from Kafka and saves them as files in the correct directory. I have no idea what is going on. I tried to enable logging using log4j , but it doesn't even write anything when on Tomcat. I searched for an answer but could not find a solution.

To summarize

My Scala A Spark application (which is an SBT project) should read records from Kafka and save them as files every 10 seconds. It works on startup using the sbt run command, but when deployed to Tomcat it does not work.

Additional Information:

  • Scala 2.12
  • Tomcat 7
  • SBT 0.13.15
  • request more

Q: What is the problem?

+5
source share
1 answer

tl; dr The SparkConsumer application behaves correctly on Tomcat, as well as Tomcat itself.

I am very surprised to read the question because your code is not something that I expect from working ever on Tomcat. Unfortunately.

Tomcat is a servlet container and as such requires servlets in a web application.

Even though you managed to create a WAR and deploy it to Tomcat, you did not β€œlaunch” any of this web application to launch the Spark Streaming application (the code inside the main method).

The Spark Streaming application works fine when running sbt run , because the goal is sbt run , that is, to run a stand-alone application in a project managed by sbt.

Given that you only have one standalone application in your sbt project, sbt run managed to find SparkConsumer and execute its main input method. There is nothing surprising here.

However, it will not work on Tomcat. You will need to open the application as a POST or GET endpoint and use an HTTP client (browser or command line tool like curl, wget or httpie) to execute it.

Spark does not support Scala 2.12, so ... how did you manage to use the Scala version with Spark ?! Impossible!

+1
source

All Articles