Spark: creating an RDD from a REST service

Is there any functionality in Spark to bind RDD to a REST service? That is, calling the web service and getting the RDD.

Or is the easiest approach to invoke the remainder service yourself and convert the result collection to RDD?

Thanks.

+5
source share
2 answers

I used the jersey client, read the line (one full json document per line), and with this line I did the following:

val stringResponse = request.request().get(classOf[String]) val jsonDataset = session.createDataset[String](Seq(stringResponse)) // try with case class val parsedResponse = session.read.json(jsonDataset) 

..., whereby in the DataFrame you can select the material.

0
source

You can refer to the Spark-Jobserver link

Some of the Spark-Jobserver features that I think you are looking for are:

  • Spark as a Service: a simple REST interface for all aspects of work, context management.
  • Start and stop work contexts for sharing RDD and low-latency jobs; change resources on reboot
  • Asynchronous and synchronous API work. The synchronous API is great for working with low latency!
  • Named RDDs for caching and retrieving RDDs by name, improving RDD sharing and reuse among jobs.

Hope this helps.

-2
source

Source: https://habr.com/ru/post/1214081/


All Articles