Corrected Streaming Historical State

I am creating real-time processing to detect an ATM card transaction. To effectively detect fraud, logic requires you to have the last date of the transaction on the card, the amount of the transaction amount per day (or the last 24 hours).

One option for usecase is that if a transaction with a card outside the home country exceeds 30 days of the last transaction in that country, then send a warning as a possible fraud.

So I tried to look at Spark streaming as a solution. For this (maybe I lack the idea of ​​functional programming) below is my psudo code

stream=ssc.receiverStream() //input receiver s1=stream.mapToPair() // creates key with card and transaction date as value s2=stream.reduceByKey() // applies reduce operation for last transaction date s2.checkpoint(new Duration(1000)); s2.persist(); 

I ran into two problems here

1) how to use this latest transaction date for future comparison with the same card
2) how to save data like this, even if you restart the drive program, then the old s2 values ​​are returned 3) updateStateByKey can be used to maintain the historical state?

I think that I miss the key point in spark streaming / functional programming, how to implement this logic.

+7
java scala apache-spark spark-streaming shark-sql
source share
1 answer

If you use Spark Streaming, you should not save your state in a file, especially if you plan to run the application 24 hours a day. If this is not your intention, you are likely to be fine with just the Spark application, as you are faced with large data calculations, not real-time batch calculations.

Yes, updateStateByKey can be used to maintain state through various parties, but it has a specific signature, which you can see in the documents: http://spark.apache.org/docs/latest/api/scala/index.html#org. apache.spark.streaming.dstream.PairDStreamFunctions

Also persist () is just a form of caching, it does not actually save your data to disk (for example, in a file).

Hope to find out some of your doubts.

+3
source share

All Articles