MailboxProcessor Performance Issues

I'm trying to create a system that allows you to simultaneously represent a large number of simultaneous users in memory. When you plan to develop this system, I immediately thought of some kind of acting solution related to Erlang.

The system must be run in .NET, so I started working on a prototype in F # using MailboxProcessor, but ran into serious performance issues. My initial idea was to use one member (MailboxProcessor) for each user to serialize communications for one user.

I highlighted a small piece of code that reproduces the problem that I see:

open System.Threading; open System.Diagnostics; type Inc() = let mutable n = 0; let sw = new Stopwatch() member x.Start() = sw.Start() member x.Increment() = if Interlocked.Increment(&n) >= 100000 then printf "UpdateName Time %A" sw.ElapsedMilliseconds type Message = UpdateName of int * string type User = { Id : int Name : string } [<EntryPoint>] let main argv = let sw = Stopwatch.StartNew() let incr = new Inc() let mb = Seq.initInfinite(fun id -> MailboxProcessor<Message>.Start(fun inbox -> let rec loop user = async { let! m = inbox.Receive() match m with | UpdateName(id, newName) -> let user = {user with Name = newName}; incr.Increment() do! loop user } loop {Id = id; Name = sprintf "User%i" id} ) ) |> Seq.take 100000 |> Array.ofSeq printf "Create Time %i\n" sw.ElapsedMilliseconds incr.Start() for i in 0 .. 99999 do mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i)); System.Console.ReadLine() |> ignore 0 

Just creating 100k actors will take about 800 ms on my i7 quad core. Then, sending an UpdateName message UpdateName each of the participants and waiting for their completion, takes about 1.8 seconds.

Now I understand that there is overhead from the entire queue: inclusion in ThreadPool, installation / reset of AutoResetEvents, etc. inside MailboxProcessor. But is this really the expected performance? From reading MSDN and various blogs on MailboxProcessor, I got the idea that this will be a genre for erlang artists, but because of the absurd performance, I see that this does not seem to be true?

I also tried a modified version of the code that uses 8 MailboxProcessors, and each of them contains a Map<int, User> map, which is used to search for a user by ID, this gave some improvements, reducing the total time for UpdateName to 1.2 seconds. But it still feels very slow, modified code here:

 open System.Threading; open System.Diagnostics; type Inc() = let mutable n = 0; let sw = new Stopwatch() member x.Start() = sw.Start() member x.Increment() = if Interlocked.Increment(&n) >= 100000 then printf "UpdateName Time %A" sw.ElapsedMilliseconds type Message = CreateUser of int * string | UpdateName of int * string type User = { Id : int Name : string } [<EntryPoint>] let main argv = let sw = Stopwatch.StartNew() let incr = new Inc() let mb = Seq.initInfinite(fun id -> MailboxProcessor<Message>.Start(fun inbox -> let rec loop users = async { let! m = inbox.Receive() match m with | CreateUser(id, name) -> do! loop (Map.add id {Id=id; Name=name} users) | UpdateName(id, newName) -> match Map.tryFind id users with | None -> do! loop users | Some(user) -> incr.Increment() do! loop (Map.add id {user with Name = newName} users) } loop Map.empty ) ) |> Seq.take 8 |> Array.ofSeq printf "Create Time %i\n" sw.ElapsedMilliseconds for i in 0 .. 99999 do mb.[i % mb.Length].Post(CreateUser(i, sprintf "User%i-UpdateName" i)); incr.Start() for i in 0 .. 99999 do mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i)); System.Console.ReadLine() |> ignore 0 

So my question is here, am I doing something wrong? Did I understand that you understand how to use a mailbox? Or is it the performance that is expected.

Update:

So, I grabbed some guys on ## fsharp @ irc.freenode.net who told me that using sprintf is very slow, and as it turned out, most of my performance problems arise from, But, removing sprintf operations above is simple using the same name for each user, I still get about 400 ms to perform operations that seem very slow.

+7
source share
2 answers

Now I understand that there is overhead from the entire queue: inclusion in ThreadPool, installation / reset of AutoResetEvents, etc. inside MailboxProcessor.

And printf , Map , Seq and fighting for your global mutable Inc And you skip the stack frames selected by the heap. In fact, only a small portion of the time taken to run your test is related to MailboxProcessor .

But is this really the expected performance?

I'm not surprised at the performance of your program, but it says little about the performance of MailboxProcessor .

From reading both MSDN and various blogs on MailboxProcessor, I got the idea that this would be a genre for erlang artists, but because of the crazy performance, I see that this does not seem to be true?

MailboxProcessor conceptually somewhat similar to the Erlang part. The Abkhaz performance you see is due to a lot of things, some of which are pretty subtle and will affect any such program.

So my question is here, am I doing something wrong?

I think you are doing a few things wrong. Firstly, the problem you are trying to solve is not clear, therefore it sounds like a XY problem question. Secondly, you are trying to match the wrong things (for example, you complain about the microsecond times needed to create the MailboxProcessor , but you may intend to do this only when a TCP connection is established, which takes several orders of magnitude longer). Thirdly, you wrote a control program that measures the performance of some things, but attributed your observations to completely different things.

Take a look at your test program in more detail. Before we do anything else, fix some errors. You should always use sw.Elapsed.TotalSeconds to measure time, because it is more accurate. You should always repeat in an asynchronous workflow using return! and not do! , or you will lose stack frames.

My initial timings:

 Creation stage: 0.858s Post stage: 1.18s 

Then run the profile to make sure that our program really spends most of its time sorting through F # MailboxProcessor :

 77% Microsoft.FSharp.Core.PrintfImpl.gprintf(...) 4.4% Microsoft.FSharp.Control.MailboxProcessor`1.Post(!0) 

It is clear that we did not hope. Thinking more abstractly, we generate a lot of data using things like sprintf and then apply it, but together we create a generation and an application. Separate our initialization code:

 let ids = Array.init 100000 (fun id -> {Id = id; Name = sprintf "User%i" id}) ... ids |> Array.map (fun id -> MailboxProcessor<Message>.Start(fun inbox -> ... loop id ... printf "Create Time %fs\n" sw.Elapsed.TotalSeconds let fxs = [|for i in 0 .. 99999 -> mb.[i % mb.Length].Post, UpdateName(i, sprintf "User%i-UpdateName" i)|] incr.Start() for f, x in fxs do fx ... 

Now we get:

 Creation stage: 0.538s Post stage: 0.265s 

Thus, creation is 60% faster, and publication is 4.5 times faster.

Try rewriting your test completely:

 do for nAgents in [1; 10; 100; 1000; 10000; 100000] do let timer = System.Diagnostics.Stopwatch.StartNew() use barrier = new System.Threading.Barrier(2) let nMsgs = 1000000 / nAgents let nAgentsFinished = ref 0 let makeAgent _ = new MailboxProcessor<_>(fun inbox -> let rec loop n = async { let! () = inbox.Receive() let n = n+1 if n=nMsgs then let n = System.Threading.Interlocked.Increment nAgentsFinished if n = nAgents then barrier.SignalAndWait() else return! loop n } loop 0) let agents = Array.init nAgents makeAgent for agent in agents do agent.Start() printfn "%fs to create %d agents" timer.Elapsed.TotalSeconds nAgents timer.Restart() for _ in 1..nMsgs do for agent in agents do agent.Post() barrier.SignalAndWait() printfn "%fs to post %d msgs" timer.Elapsed.TotalSeconds (nMsgs * nAgents) timer.Restart() for agent in agents do use agent = agent () printfn "%fs to dispose of %d agents\n" timer.Elapsed.TotalSeconds nAgents 

This version expects nMsgs for each agent before the agent increases the total counter, which will significantly reduce the performance impact of this common counter. This program also reviews performance with varying numbers of agents. On this machine, I get:

 Agents M msgs/s 1 2.24 10 6.67 100 7.58 1000 5.15 10000 1.15 100000 0.36 

Thus, it seems that part of the reasons for the low msgs / s speed that you see is an unusually large number (100,000) of agents. With 10-1000 agents, F # implementation is more than 10 times faster than 100,000 agents.

So, if you can handle this kind of performance, you should be able to write your entire application in F #, but if you need to improve performance, I would recommend using a different approach. You may not even have to sacrifice with F # (and you can use it for prototyping) by adopting a design similar to Disruptor. In practice, I found that the time spent serializing on .NET is usually much more than the time spent in F # async and MailboxProcessor .

+14
source

After eliminating sprintf I got about 12 seconds (mono on Mac is not so fast). Accepting Phil Trelford's suggestion to use the Dictionary instead of the Map, he switched to 600 ms. Did not try it on Win / .Net.

Changing the code is quite simple, and local variability is quite acceptable for me:

 let mb = Seq.initInfinite(fun id -> MailboxProcessor<Message>.Start(fun inbox -> let di = System.Collections.Generic.Dictionary<int,User>() let rec loop () = async { let! m = inbox.Receive() match m with | CreateUser(id, name) -> di.Add(id, {Id=id; Name=name}) return! loop () | UpdateName(id, newName) -> match di.TryGetValue id with | false, _ -> return! loop () | true, user -> incr.Increment() di.[id] <- {user with Name = newName} return! loop () } loop () ) ) |> Seq.take 8 |> Array.ofSeq 
+2
source

All Articles