I am writing an application that reads relatively large text files, checks and converts data (each line in a text file is its own element, there are about 100 M elements / files) and creates some kind of output. A multi-threaded Java application already exists (using a BlockingQueue between reading / processing / persistent tasks), but I want to implement a Scala application that does the same.
Akka seems to be a very popular choice for building parallel applications. Unfortunately, due to the asynchronous nature of the actors, I still do not understand what one actor can or cannot do, for example. if I can use actors as traditional workers who do some calculations.
Several documents say that actors should never be blocked, and I understand why. But in the examples cited for blocking code, only things like blocking file / network IOs are always mentioned .. things that make the actor wait a short period of time, which, of course, is bad.
But what if an actor “blocks” because he is actually doing something useful, rather than waiting? In my case, the processing and conversion of one line / text element takes 80 ms, which is quite a long time (clean processing, without the participation of the IO). Can this work be done directly by the actor or should I use the future instead (but then, if I still need to use Futures, why use Akka in the first place ..)?
Akka's documents and examples show that work can be done directly by actors. But it seems that the authors are doing a very simplified work (for example, calling a filter on String or increasing the counter and that it). I don’t know if they do it so that the documents are simple and concise, or because you really should not do more than in the actor.
How would you create an Akka-based application for my use (reading a text file, processing each line, which takes quite a lot of time, eventually saving the result)? Or is it some kind of problem that does not fit Akka?
scala concurrency actor akka typesafe
alapeno
source share