Windows Services Scaling

Question

Windows Services Scaling

I am looking for information on how to scale the Windows service that is currently running in my company. We use .NET 4.0 (it can and will be updated to 4.5 at some point in the future) and run it on Windows Server 2012.

About service
The task of the service is to request new lines in the logging table (we work with the Oracle database), process information, create and / or update a series of lines in 5 other tables (let them track them), update the registration table and repeat.

The logging table contains large volumes of XML (it can reach 20 MB per row), which must be selected and stored in the other 5 tracking tables. New lines are added all the time at a maximum speed of 500,000 lines per hour.
Tracking table traffic is much higher, from 90,000 new rows in the smallest to potentially million rows in the largest table in every hour. Not to mention that there are update operations for these tables as well.

About processed data
I believe that this bit is important for finding a solution based on how these objects are grouped and processed. The data structure is as follows:

public class Report { public long Id { get; set; } public DateTime CreateTime { get; set; } public Guid MessageId { get; set; } public string XmlData { get; set; } } public class Message { public Guid Id { get; set; } }

The report is the registration data that I need to select and process.
There are an average of 5 reports for each post. In some cases, this can vary from 1 to hundreds.
The message contains many other collections and other relationships, but they are not relevant to the issue.

Today, Windows Service, we can barely cope with the load on a 16-core server (I do not remember the full specifications, but we can say with confidence that this machine is a beast). I was instructed to find a way to scale and add more machines that will process all this data and not interfere with other instances.

Currently, each message receives its own stream and processes the corresponding reports. We process reports in packages grouped by their MessageId in order to reduce the number of database queries to a minimum when processing data.

Limitations

At this point, I am allowed to overwrite this service from scratch using any architecture that I consider necessary.
In the event of an instance failure, other instances should be able to pick up the place where it was broken. Data cannot be lost.
This processing should be as close to real time as possible from reports inserted into the database.

I am looking for any data or advice on how to build such a project. I believe that services should be inactive, or is there a way to synchronize caches for all instances in some way? How can I coordinate between all instances and make sure that they are not processing the same data? How to distribute the load between them equally? And, of course, how to deal with the failure of an instance and not perform its work?

EDIT
Remote irrelevant information

+7

c # scalability windows-services horizontal-scaling

Artless Feb 04 '13 at 20:44

source share

2 answers

For your work items, Windows Workflow is probably your fastest way to reorganize your service.

Windows Workflow Foundation @MSDN

The most useful thing you will choose from WF is the constancy of the workflow, when the correct workflow can resume from the Persist point if something happens to the workflow from the last point at which it was saved.

Persistence @MSDN

This includes the ability to restore a workflow from another process if any other process crashes while processing a workflow. Resuming a process does not have to be on the same computer if you use a shared repository of workflows. Note that all recoverable workflows require the use of a workflow repository.

You have a couple of options for sharing work.

A service for creating messages in conjunction with host-based load balancing by invoking a workflow using WCF endpoints through the WorkflowService class. Note that you probably want to use the development mode editor here to build input methods, and not manually configure Receive and the corresponding SendReply handlers (they map to WCF methods). You are likely to call the service for each message, and perhaps also call the service for each report. Note that the CanCreateInstance property is important here. Each call associated with it will create an executable instance that runs independently.
~
Class WorkflowService (System.ServiceModel.Activities) @MSDN
Get class (System.ServiceModel.Activities) @MSDN
Receive.CanCreateInstance Property (System.ServiceModel.Activities) @MSDN
Class SendReply (System.ServiceModel.Activities) @MSDN
Use a queue-enabled service bus. At a minimum, you want something that potentially allows input from any number of clients, and whose outputs can be uniquely identified and processed exactly once. Some of them come to mind: NServiceBus, MSMQ, RabbitMQ, and ZeroMQ. Of the elements mentioned here, NServiceBus is extremely .NET ready. In a cloudy context, your options also include platform offerings such as Azure Service Bus and Amazon SQS.
~
NServiceBus
MSMQ @MSDN
Rabbitmq
ZeroMQ
Azure Service Bus @MSDN
Amazon SQS @Amazon AWS
~
Please note that the service bus is just the glue between the manufacturer that initiates the message and the consumer, which can exist on any number of machines to read from the queue. Similarly, you can use this direction to generate reports. Your customer will create workflow instances that can then take advantage of workflow constancy.
Windows AppFabric can be used to host workflows, which allows you to use many methods that apply to IIS load balancing to distribute your work. I personally have no experience with this, so I can not say anything except that it has good monitoring support out of the box.
~
A practical guide. Hosting a Workflow Service with Windows @MSDN Application Fabric

+6

meklarian Feb 04 '13 at 21:55

source share

Artless · Accepted Answer · 2013-03-11T18:16:07+0000

I solved this myself using all this scalability and redundancy. I will explain what I did and how I did it if anyone ever needed.

I created several processes in each instance to keep track of the rest and know which records a particular instance is processing. When launched, the instance will be registered in the database (if it is not already specified) in a table named Instances . This table has the following columns:

 Id Number MachineName Varchar2 LastActive Timestamp IsMaster Number(1)

After registering and creating a row in this table, if the MachineName instance MachineName not found, the instance starts pinging this table every second in a separate thread, updating the LastActive column. He then selects all the rows from this table and ensures that the Master Instance (more on that later) is still alive, which means that the LastActive time LastActive in the last 10 seconds. If the master instance stops responding, it will take control and establish itself as the master. In the next iteration, he will make sure that there is only one master (if another instance decided to take control at the same time), and if not, this will lead to the instance with the lowest Id .

What is a master instance?
The task of the service is to scan the logging table and process the data so that people can easily filter and read. I did not say this in my question, but it may be appropriate. We have a group of ESB servers that write several records to the logging table for each request, and my maintenance task is to track them almost in real time. Since they write their logs asynchronously, I can potentially get the finished processing request A entry before started processing request A in the log. That way, I have code that sorts these records and ensures that my service processes the data in the correct order. Since I needed to scale this service, only one instance can do this logic in order to avoid many unnecessary database queries and possibly crazy errors.
Here comes the Master Instance . Only it executes this sorting logic and temporarily stores the log entry id in another table named ReportAssignment . This task of the table is to keep track of which records were processed and by whom. After processing is completed, the record is deleted. The table looks like this:

 RecordId Number InstanceId Number Nullable

The master instance sorts the journal entries and inserts their identifier here. All my service instances check this table at 1 second intervals for new records that are not being processed by someone or that are being processed by an inactive instance, and that [record Id] % [number of isnstances] == [index of current instance in a sorted array of all the active instances] (which were obtained during the Pinging process). The query looks something like this:

 SELECT * FROM ReportAssignment WHERE (InstanceId IS NULL OR InstanceId NOT IN (1, 2, 3)) // 1,2,3 are the active instances AND RecordId % 3 == 0 // 0 is the index of the current instance in the list of active instances

Why do I need it?

The other two instances will RecordId % 3 == 1 for RecordId % 3 == 1 and RecordId % 3 == 2 .
RecordId % [instanceCount] == [indexOfCurrentInstance] ensures that records are distributed evenly among all instances.
InstanceId NOT IN (1,2,3) allows instances to capture records that were processed by the instance that crashed, and not to process records of already active instances when adding a new instance.

As soon as an instance requests these records, it will execute the update command by setting the InstanceId to its own and query the record log table with these identifiers. When processing is complete, it deletes records from ReportAssignment .

In general, I am very pleased with this. It scales perfectly, ensures that no data will be lost if the instance does not disappear, and there are almost no changes to the existing code that we have.

Windows Services Scaling

More articles: