I solved this myself using all this scalability and redundancy. I will explain what I did and how I did it if anyone ever needed.
I created several processes in each instance to keep track of the rest and know which records a particular instance is processing. When launched, the instance will be registered in the database (if it is not already specified) in a table named Instances . This table has the following columns:
Id Number MachineName Varchar2 LastActive Timestamp IsMaster Number(1)
After registering and creating a row in this table, if the MachineName instance MachineName not found, the instance starts pinging this table every second in a separate thread, updating the LastActive column. He then selects all the rows from this table and ensures that the Master Instance (more on that later) is still alive, which means that the LastActive time LastActive in the last 10 seconds. If the master instance stops responding, it will take control and establish itself as the master. In the next iteration, he will make sure that there is only one master (if another instance decided to take control at the same time), and if not, this will lead to the instance with the lowest Id .
What is a master instance?
The task of the service is to scan the logging table and process the data so that people can easily filter and read. I did not say this in my question, but it may be appropriate. We have a group of ESB servers that write several records to the logging table for each request, and my maintenance task is to track them almost in real time. Since they write their logs asynchronously, I can potentially get the finished processing request A entry before started processing request A in the log. That way, I have code that sorts these records and ensures that my service processes the data in the correct order. Since I needed to scale this service, only one instance can do this logic in order to avoid many unnecessary database queries and possibly crazy errors.
Here comes the Master Instance . Only it executes this sorting logic and temporarily stores the log entry id in another table named ReportAssignment . This task of the table is to keep track of which records were processed and by whom. After processing is completed, the record is deleted. The table looks like this:
RecordId Number InstanceId Number Nullable
The master instance sorts the journal entries and inserts their identifier here. All my service instances check this table at 1 second intervals for new records that are not being processed by someone or that are being processed by an inactive instance, and that [record Id] % [number of isnstances] == [index of current instance in a sorted array of all the active instances] (which were obtained during the Pinging process). The query looks something like this:
SELECT * FROM ReportAssignment WHERE (InstanceId IS NULL OR InstanceId NOT IN (1, 2, 3)) // 1,2,3 are the active instances AND RecordId % 3 == 0 // 0 is the index of the current instance in the list of active instances
Why do I need it?
- The other two instances will
RecordId % 3 == 1 for RecordId % 3 == 1 and RecordId % 3 == 2 . RecordId % [instanceCount] == [indexOfCurrentInstance] ensures that records are distributed evenly among all instances.InstanceId NOT IN (1,2,3) allows instances to capture records that were processed by the instance that crashed, and not to process records of already active instances when adding a new instance.
As soon as an instance requests these records, it will execute the update command by setting the InstanceId to its own and query the record log table with these identifiers. When processing is complete, it deletes records from ReportAssignment .
In general, I am very pleased with this. It scales perfectly, ensures that no data will be lost if the instance does not disappear, and there are almost no changes to the existing code that we have.