Should Akka Actors perform real processing tasks?

I am writing an application that reads relatively large text files, checks and converts data (each line in a text file is its own element, there are about 100 M elements / files) and creates some kind of output. A multi-threaded Java application already exists (using a BlockingQueue between reading / processing / persistent tasks), but I want to implement a Scala application that does the same.

Akka seems to be a very popular choice for building parallel applications. Unfortunately, due to the asynchronous nature of the actors, I still do not understand what one actor can or cannot do, for example. if I can use actors as traditional workers who do some calculations.

Several documents say that actors should never be blocked, and I understand why. But in the examples cited for blocking code, only things like blocking file / network IOs are always mentioned .. things that make the actor wait a short period of time, which, of course, is bad.

But what if an actor “blocks” because he is actually doing something useful, rather than waiting? In my case, the processing and conversion of one line / text element takes 80 ms, which is quite a long time (clean processing, without the participation of the IO). Can this work be done directly by the actor or should I use the future instead (but then, if I still need to use Futures, why use Akka in the first place ..)?

Akka's documents and examples show that work can be done directly by actors. But it seems that the authors are doing a very simplified work (for example, calling a filter on String or increasing the counter and that it). I don’t know if they do it so that the documents are simple and concise, or because you really should not do more than in the actor.

How would you create an Akka-based application for my use (reading a text file, processing each line, which takes quite a lot of time, eventually saving the result)? Or is it some kind of problem that does not fit Akka?

+7
scala concurrency actor akka typesafe
source share
2 answers

It all depends on the type of actor.

I use this rule of thumb: if you do not need to talk to this actor, and this actor has no other responsibilities, then it is normal to block the actual work from being performed in him. You can consider it as Future , and this is what I would call a "worker."

If you block an actor who is not a leaf node (worker), that is, a working distributor, then the whole system will slow down.

There are several templates that include pull / click or actor work in each request model. Any of these may be appropriate for your application. You can have a manager who creates an actor for each work, and when the work is finished, the actor sends the result back to the manager and dies. You can also keep the actor alive and ask for more work from this actor. You can also combine actors and futures.

Sometimes you want to be able to talk with an employee if your processing is more complex and involves several steps. In this case, the employee can delegate the work to another actor or the future.

To summarize, do not block distribution managers / participants. It is normal to block operation if it does not slow down your system.

disclaimer: by blocking, I mean doing the actual work, not just an expectation that will never be normal.

+6
source share

Performing calculations that take 100 ms is great for an actor. However, you must ensure that the backpressure is handled correctly. One way would be to use a work-pulling pattern where your processor-bound entities request a new job when they are ready, instead of receiving new work items in the message.

However, your description of the problem is similar to a processing pipeline that might benefit from using higher-level abstractions, such as akka threads . Basically, create a stream of file names for processing, and then use transformations such as a map to get the desired result. I have something like this in production that looks like your description of the problem, and it works very well if the data used by the individual processing fragments is not too large.

Of course, the stream will also be implemented for several participants. But a high-level interface will be more secure in type and easier to reason about.

+3
source share

All Articles