I am creating real-time processing to detect an ATM card transaction. To effectively detect fraud, logic requires you to have the last date of the transaction on the card, the amount of the transaction amount per day (or the last 24 hours).
One option for usecase is that if a transaction with a card outside the home country exceeds 30 days of the last transaction in that country, then send a warning as a possible fraud.
So I tried to look at Spark streaming as a solution. For this (maybe I lack the idea of functional programming) below is my psudo code
stream=ssc.receiverStream() //input receiver s1=stream.mapToPair() // creates key with card and transaction date as value s2=stream.reduceByKey() // applies reduce operation for last transaction date s2.checkpoint(new Duration(1000)); s2.persist();
I ran into two problems here
1) how to use this latest transaction date for future comparison with the same card
2) how to save data like this, even if you restart the drive program, then the old s2 values are returned 3) updateStateByKey can be used to maintain the historical state?
I think that I miss the key point in spark streaming / functional programming, how to implement this logic.
java scala apache-spark spark-streaming shark-sql
Jigar parekh
source share