Pushing data changes compared to transferring data changes in an application

Suppose you have an application that consists of two layers:

  • A: A data level in which all data downloaded from a database or from a file is stored.
  • B: A level that displays data in a nice user interface, for example. graph report

Now the data changes in layer A. We have 2 approaches to ensure that reports from level B are correctly updated.

The first approach is the PUSH approach. Layer A notifies level B through observers, so layer B can update its reports.

There are several drawbacks to the PUSH approach:

  • If the data changes several times (for example, at boot time or in algorithms that change a lot of data), observers are executed many times. This can be solved by introducing some kind of buffering (prevent observer calls while you are still changing), but it can be very difficult, and the correct buffering calls are often forgotten.
  • If a lot of data is changed, observer calls can cause overhead that is unacceptable in the application.

Another approach is the PULL approach. Layer A simply remembers what data has been changed and does not send any notifications (layer A is marked dirty). After the action that was performed by the user (an algorithm or file loading or something else can be performed), we check all the components of the user interface and ask them to update themselves. In this case, layer B is asked to update itself. First, he checks if any of its underlying layers is dirty (layer A). If so, he will receive the changes and the update itself. If layer A was not dirty, the report knew that it had nothing to do.

The best solution depends on the situation. In my situation, the PUSH approach looks much better.

The situation becomes much more complicated if we have more than 2 layers. Suppose we have the following 4 layers:

  • A: A data level in which all data downloaded from a database or from a file is stored.
  • B: A layer that uses a data layer (layer A), for example. to filter data from A using a sophisticated filter function
  • C: a layer that uses layer B, for example. to combine data from layer B into smaller pieces of information.
  • D: a report that interprets the results of layer C and presents it in a beautiful graphical way for the user

In this case, REDUCING the changes will almost certainly result in significantly higher costs.

On the other hand, for PULLING changes it is required that:

  • layer D should cause layer C to ask if it is dirty
  • layer C should call layer B to ask if it is dirty
  • layer B must cause layer A to find out if it is dirty

If nothing has changed, the number of calls to complete before you find out that nothing has actually been changed, and you do not need to do anything, is quite large. It seems that the performance overhead that we are trying to avoid without using PUSH is now reverting to use in the PULL approach due to many calls to ask if anything is dirty.

Are there any patterns that solve this problem in a good and high performance (low waybill) way?

+4
source share
2 answers

No. No free lunch, no silver bullet. It's all about meticulous design. You have largely used common methods, applying them mentally, which requires care and the prevention of assumptions.

I am requesting two of your statements:

You mean that controlling PUSH notifications is unnecessarily difficult. I would suggest that in many cases, you usually have a main computing engine that captures data and performs calculations. At some point, the engine must necessarily stop, and at that moment it can send the "New Data" event, which may contain more subtle information about what has changed.

You say making 4 interlayer calls is too expensive. What is the basis for this? compared to what? If you are interested in the factor of the multiplier (10 D copies) of the call (5 C copies) of the call (2 B copies) of the call (1 copy of the instance), so that A receives 100 calls, then, of course, we optimize? At each level, you can say, "If I’m calling now, or I recently heard an answer, you don’t need to call again."

When we consider the benefits of layer scaling, a few cheap queries cannot be excessive.

+3
source

Click through the data manager and compress the changes that occur in less than n nanoseconds. The data manager implements subscription publishing.

These average data producers depend only on the data manager, and data consumers only receive data.

(for consumers there is an inverse relationship).

This makes the entire data stream the data stream explicit in your glue code. Subscriptions can be set in advance, so developers don’t need to know how this works.

The data manager can use its own stream to call subscribers' notifications, this separates manufacturers from consumers neatly. You can easily compress changes because the data manager uses only one thread for notification, it can be "notified" through a timer, and when it wakes up, it only sees the last state.

0
source

All Articles