I am testing Apache Spark for my final college project. I have a dataset that I use to generate a decision tree, and make some predictions for new data.
In the future, I plan to use this project in production, where I would generate a decision tree (batch processing), and through the web interface or mobile application I will receive new data, making a prediction of the class of this record, and immediately report the result to the user. And also start saving these new records after a while, creating a new decision tree (batch processing) and repeat this process continuously.
Despite the fact that Apache Spark aims to perform batch processing, there is a streaming API that allows you to receive data in real time, and in my application this data will be used only by a model built in a batch process, from the decision tree and how the prediction is performed quite quickly, it allows the user to quickly get an answer.
My question is, what is the best way to integrate Apache Spark with a web application (planning to use a version of the Play Framework scala)?
source
share