TL DR;
None of the approaches are intended to help when the bottleneck is in the processor. You will see some gains by having several elements passing through the processor at the same time, but both parameters that you specify get their full advantages when used in processes related to I / O. AsyncItemProcessor / AsyncItemWriter might be a better option.
Review of Spring Scaling Party
There are five options for scaling Spring Batch Processing Tasks:
- Multithreaded step
- Parallel steps
- Markup
- Remote lock
AsyncItemProcessor / AsyncItemWriter
Each has its own advantages and disadvantages. Go through each:
Multithreaded step
A multi-threaded step takes one step and executes each fragment of this step in a separate thread. This means that the same instances of each of the batch components (readers, writers, etc.) are distributed between threads. This can improve performance by adding some parallelism to the step due to restartability in most cases. You sacrifice a restart, because in most cases the ability to restart is based on the state supported in the reader / writer / etc. If multiple threads update this state, it becomes invalid and useless to restart. Because of this, you usually need to disable the save state on individual components and set the restart flag to false in the task.
Parallel steps
Parallel steps are performed through separation. It allows you to perform several independent steps in parallel along the flows. This does not sacrifice a restart, but does not help improve the performance of a single step or part of the business logic.
Markup
Separation is the separation of data in advance, into smaller pieces (called partitions) using a master step, and then with subordinate work independently on each other in partitions. In the Spring Batch package, both the master and each subordinate are an independent step, so you can take advantage of parallelism in one step without sacrificing a restart. Partitioning also provides the ability to scale beyond a single JVM in that slaves do not have to be local (you can use various communication mechanisms to communicate with remote slaves).
An important note about the separation is that the only communication between the master and the slave is a description of the data, not the data itself. For example, a wizard might tell slave1 to process records 1-100, slave2 to process records 101-200, etc. The master does not send the actual data, but only the information necessary for the slave to receive the data that it must process. Because of this, the data must be local to the slave processes, and the master can be located anywhere.
Remote lock
Remote chunking allows you to scale the process and possibly the write logic in the JVM. In this usage example, the master reads the data and then sends it over the cable to the slaves, where it is processed, and then either writes locally to the slave or returns to the master to write locally to master.
An important difference between partitioning and remote connection is that instead of the description passing through the wire, the remote data exchange sends the actual data through the wire. Thus, instead of a single packet talking to process records 1-100, remote chunking is about to send the actual records 1-100. This can have a big impact on the I / O profile at the stage, but if the processor is bottlenecked enough, it can be useful.
AsyncItemProcessor / AsyncItemWriter
Spring's Final Scale Option Batch processes are a combination of AsyncItemProcessor / AsycnItemWriter . In this case, AsyncItemProcessor wraps your ItemProcessor implementation and calls your implementation in a separate thread. Then, AsyncItemProcessor returns Future , which is passed to AsyncItemWriter , where it is deployed and passed to the ItemWriter delegate ItemWriter .
Due to the nature of how data is passed through this option, some listener scripts are not supported (since we donβt know the result of calling ItemProcessor until inside an ItemWriter ), but in general, it can provide a useful tool for parallelizing only ItemProcessor logic in one JVM without sacrificing a restart.