How does Spark StreamingLinearRegressionWithSGD work?

I am working on StreamingLinearRegressionWithSGD , which has two methods trainOn and predictOn . This class has a model object, which is updated as training data arrives at the stream specified in the trainOn argument.

At the same time, He gives a forecast using the same model.

I want to know how to update and synchronize model scales between workers / performers.

Any link or link will be helpful. Thanks.

+3
source share
1 answer

There is no magic here. StreamingLinearAlgorithm stores the changed link in the current GeneralizedLinearModel .

trainOn uses DStream.foreachRDD to train a new model for each batch, and then updates the model . Similarly, predictOn uses DStream.map to predict with the current version of model .

Since Spark will serialize closures for each step, there is no need for additional synchronization. Spark will use the current model value each time it calculates the closure.

In fact, this is equivalent to starting a loop in the driver with alternating run and predict .

+2
source

All Articles