Let me answer this clearly.
After a lot of digging and with the help of Andreas Olund in the NSB team ( http://tech.groups.yahoo.com/group/nservicebus/message/17758 ), the correct answer to this question
- As Udi Dahan mentioned, by design ONLY the distributor / master node should run the timeout manager in the scaling script.
- Unfortunately, in earlier versions of NServiceBus 3 this is not implemented as designed.
You have the following 3 problems:
1) Starting with a Distributor profile does NOT start the timeout manager.
Workaround:
Run the timeout manager directly at the distributor, including this code in the distribution:
class DistributorProfileHandler : IHandleProfile<Distributor> { public void ProfileActivated() { Configure.Instance.RunTimeoutManager(); } }
If you run the main profile, this is not a problem, since you automatically start the timeout manager on the master node.
2) Workers working with the Worker profile each run a local timeout manager.
This is not so designed and will interfere with polling against the timeout and sending timeouts. All employees survey store timeout "give me inevitable timeouts for MASTERNODE." Note that they request MASTERNODE timeouts, not W1, W2, etc. Thus, some employees can simultaneously receive the same timeouts from the timeout repository, which leads to conflicts with Raven when timeouts are removed from it.
Sending always occurs through the LOCAL.timouts / .timeoutsdispatcher queues, while it MUST go through the timeout manager queues on MasterNode / Distributor.
Workaround, you will need to do both:
a) Disable worker timeout manager. Include this code for your employees
class WorkerProfileHandler:IHandleProfile<Worker> { public void ProfileActivated() { Configure.Instance.DisableTimeoutManager(); } }
b) Repeat the NServiceBus for workers to use the .timeouts queue on the MasterNode / Distributor.
If you do not, any call to RequestTimeout or Defer on the desktop will die with an exception saying that you forgot to configure the timeout manager. Include this in your worker configuration:
<UnicastBusConfig TimeoutManagerAddress="{endpointname} .Timeouts@ {masternode}" />
3) Error messages "Ready" to the distributor.
Because the timeout manager sends messages directly to the job entry queue without deleting the entry from the available workers in the distributorโs storage queue, workers send erroneous Ready messages back to the distributor after the timeout has been processed. This happens even if you corrected 1 and 2, and it does not matter if the timeout was selected from the local timeout manager at the workplace or one of the ones running on the distributor / MasterNode. The consequence is the creation of an additional record in the storage queue at the distributor for each timeout processed by the worker.
Workaround: Use NServiceBus 3.3.15 or later.