Deploying programs using queues

In a conversation at around 54:53, Rich Hickey talks about using queues as a means to decouple the dependent parts of a program. Can you give me an example on how to disable the following fragment of Java pseudocode in order to improve its design and / or flexibility:

// Warning: Java-pseudo-code ahead class Job { public void doRun(A a) { saveObjectToDatabase(a); B b = computeB(a); saveObjectToDatabase(b); C c = computeC(b); logToFile(c); } } 

saveObjectToDatabase and saveObjectToDatabase can be considered as a method with side effects, while the output of computeB and computeC depends only on a .

I know this question is rather vague / wide. I would like to get an idea of ​​how to use queuing mechanisms without mass complicating my program and still make sure that it does the right thing in the right order. Any pointers in the right direction are appreciated.

+8
java design jobs message-queue decoupling
source share
4 answers

Well, this is not a very good example, but (in the most direct design) you will basically have two queues, and (depending on the amount of data) you can omit the database.

The first process will get your objects a from the "outside world" and put them in queue 1. The second process will uninstall the objects from queue 1, execute computeB and put the results in queue 2. A third process will deactivate objects from queue 2, execute computeC and write down the result or something else.

Depending on the amount of data involved (and possibly several other factors), the “objects” passed in the queues can be either your actual objects a or b or just tokens / keys to search for data in the database.

Queues themselves can be implemented in several ways. It is possible to implement a queue with a database, for example, although the details become messy. "Processes" can be Java tasks within a single Java process, or they can be separate OS processes, possibly even on separate machines.

When you use pipes in Unix, you efficiently use queues this way.

+3
source share

This is exactly the principle used by the java library that I am using. The idea is to have components assigned to individual tasks in programs (the registrar is a great example). Now each component should be launched independently of the others, either as a thread or as an event handler.

In case of event events, each component notifies which types of events / messages it wants to listen to. You have a dispatcher who collects incoming messages and inserts them into the recipient queue. The receiver process and, ultimately, generates new messages. And Etc ...

In your case, something like this:

 class SaveObjectHandler{ // void handle(Event e, Object o){ if(e instanceof SaveEvent) saveObjectToDatabase(o); } }; class TransformObject{ // void handle(Event e,Object o){ if(e instanceof TransformEvent){ B result = compute(o); send(new SaveEvent(),result) } } }; class Logger{ void handle(Event e, Object o){ if(o instanceof B) //perform computeC logEvent((B)o); } }; 

};

In this SEDA library.

+1
source share

I am afraid that with saveObject methods that have a side effect, you cannot separate it well, or at least not easily.

But let's say you need to quickly write some objects to the database. My opinion is that the fastest way with a relational database should be to save objects in the queue by several clients, and not to get them at the same time by one or two fairly fast authors who quickly deliver data to the database.

0
source share

For completeness, I would like to add additional information in response to Hot Licks:

I did more research on this topic and finally came to the conclusion that untangling a method is the way to go. I will use the terminology of kafka manufacturers / consumers / topics. For more information, see the Journal: what every software engineer should know about combining data abstraction in real time and, in particular, this graph:

enter image description here

As for my specific question about the posted example, there are two ways to solve it:

Solution 1

  • Consumer 1:
    • consume from topic a
    • save to database.
  • Consumer 2:
    • consume from topic a
    • calculate b
    • save to database.
  • Consumer 3: use from topic a
    • calculate b
    • calculate c
    • save to database

This has the disadvantage of computing b twice . In pseudo code:

 class ConsumerA { public void consume(A a) { saveObjectToDatabase(a); } } class ConsumerB { public void consume(A a) { B b = computeB(a); saveObjectToDatabase(b); } } class ConsumerLog { public void consume(A a) { B b = computeB(a); C c = computeC(b); logToFile(c); } } 

Decision 2

  • Consumer 1:
    • consume from topic a
    • save to database.
  • Consumer 2:
    • consume from topic a
    • calculate b , save to database
    • publish b in a separate section b .
  • Consumer 3:
    • consume from topic b
    • calculate c
    • logToFile c

In pseudo code:

 class ConsumerA { public void consume(A a) { saveObjectToDatabase(a); } } class ConsumerB { public void consume(A a) { B b = computeB(a); saveObjectToDatabase(b); publish(b); // republish computed information to another topic b } } class ConsumerLog { public void consume(B b) { C c = computeC(b); logToFile(c); } } 
0
source share

All Articles