Now I understand that there is overhead from the entire queue: inclusion in ThreadPool, installation / reset of AutoResetEvents, etc. inside MailboxProcessor.
And printf , Map , Seq and fighting for your global mutable Inc And you skip the stack frames selected by the heap. In fact, only a small portion of the time taken to run your test is related to MailboxProcessor .
But is this really the expected performance?
I'm not surprised at the performance of your program, but it says little about the performance of MailboxProcessor .
From reading both MSDN and various blogs on MailboxProcessor, I got the idea that this would be a genre for erlang artists, but because of the crazy performance, I see that this does not seem to be true?
MailboxProcessor conceptually somewhat similar to the Erlang part. The Abkhaz performance you see is due to a lot of things, some of which are pretty subtle and will affect any such program.
So my question is here, am I doing something wrong?
I think you are doing a few things wrong. Firstly, the problem you are trying to solve is not clear, therefore it sounds like a XY problem question. Secondly, you are trying to match the wrong things (for example, you complain about the microsecond times needed to create the MailboxProcessor , but you may intend to do this only when a TCP connection is established, which takes several orders of magnitude longer). Thirdly, you wrote a control program that measures the performance of some things, but attributed your observations to completely different things.
Take a look at your test program in more detail. Before we do anything else, fix some errors. You should always use sw.Elapsed.TotalSeconds to measure time, because it is more accurate. You should always repeat in an asynchronous workflow using return! and not do! , or you will lose stack frames.
My initial timings:
Creation stage: 0.858s Post stage: 1.18s
Then run the profile to make sure that our program really spends most of its time sorting through F # MailboxProcessor :
77% Microsoft.FSharp.Core.PrintfImpl.gprintf(...) 4.4% Microsoft.FSharp.Control.MailboxProcessor`1.Post(!0)
It is clear that we did not hope. Thinking more abstractly, we generate a lot of data using things like sprintf and then apply it, but together we create a generation and an application. Separate our initialization code:
let ids = Array.init 100000 (fun id -> {Id = id; Name = sprintf "User%i" id}) ... ids |> Array.map (fun id -> MailboxProcessor<Message>.Start(fun inbox -> ... loop id ... printf "Create Time %fs\n" sw.Elapsed.TotalSeconds let fxs = [|for i in 0 .. 99999 -> mb.[i % mb.Length].Post, UpdateName(i, sprintf "User%i-UpdateName" i)|] incr.Start() for f, x in fxs do fx ...
Now we get:
Creation stage: 0.538s Post stage: 0.265s
Thus, creation is 60% faster, and publication is 4.5 times faster.
Try rewriting your test completely:
do for nAgents in [1; 10; 100; 1000; 10000; 100000] do let timer = System.Diagnostics.Stopwatch.StartNew() use barrier = new System.Threading.Barrier(2) let nMsgs = 1000000 / nAgents let nAgentsFinished = ref 0 let makeAgent _ = new MailboxProcessor<_>(fun inbox -> let rec loop n = async { let! () = inbox.Receive() let n = n+1 if n=nMsgs then let n = System.Threading.Interlocked.Increment nAgentsFinished if n = nAgents then barrier.SignalAndWait() else return! loop n } loop 0) let agents = Array.init nAgents makeAgent for agent in agents do agent.Start() printfn "%fs to create %d agents" timer.Elapsed.TotalSeconds nAgents timer.Restart() for _ in 1..nMsgs do for agent in agents do agent.Post() barrier.SignalAndWait() printfn "%fs to post %d msgs" timer.Elapsed.TotalSeconds (nMsgs * nAgents) timer.Restart() for agent in agents do use agent = agent () printfn "%fs to dispose of %d agents\n" timer.Elapsed.TotalSeconds nAgents
This version expects nMsgs for each agent before the agent increases the total counter, which will significantly reduce the performance impact of this common counter. This program also reviews performance with varying numbers of agents. On this machine, I get:
Agents M msgs/s 1 2.24 10 6.67 100 7.58 1000 5.15 10000 1.15 100000 0.36
Thus, it seems that part of the reasons for the low msgs / s speed that you see is an unusually large number (100,000) of agents. With 10-1000 agents, F # implementation is more than 10 times faster than 100,000 agents.
So, if you can handle this kind of performance, you should be able to write your entire application in F #, but if you need to improve performance, I would recommend using a different approach. You may not even have to sacrifice with F # (and you can use it for prototyping) by adopting a design similar to Disruptor. In practice, I found that the time spent serializing on .NET is usually much more than the time spent in F # async and MailboxProcessor .