Single-threaded program profiles 15% runtime in semaphore_wait_trap

Question

Single-threaded program profiles 15% runtime in semaphore_wait_trap

On Mac OS using mono, if I compile and profile the program below, I get the following results:

% fsharpc --nologo -g foo.fs -o foo.exe % mono --profile=default:stat foo.exe ... Statistical samples summary Sample type: cycles Unmanaged hits: 336 (49.1%) Managed hits: 349 (50.9%) Unresolved hits: 1 ( 0.1%) Hits % Method name 154 22.48 Microsoft.FSharp.Collections.SetTreeModule:height ... 105 15.33 semaphore_wait_trap 74 10.80 Microsoft.FSharp.Collections.SetTreeModule:add ... ...

Notice the second entry, semaphore_wait_trap . Here is the program:

 [<EntryPoint>] let main args = let s = seq { 1..1000000 } |> Set.ofSeq s |> Seq.iter (fun _ -> ()) 0

I looked at the source for the Set module , but I did not find any (obvious) lock.

Does my single-threaded program really spend 15% of the execution time using semaphores? If so, can I do it wrong and improve performance?

+6

profiling f # mono

Søren debois Apr 11 '14 at 20:10

source share

1 answer

Tony lee · Accepted Answer · 2014-06-17T06:04:35+0000

According to the tools, sgen / gc calls semaphore_wait_trap:

enter image description here

Sgen is documented as stopping all other threads during collection:

Before assembling (minor or major), the collector must stop all current threads so that it can have a stable view of the current state of the heap, without changing other flows.

In other words, when the code tries to allocate memory, and GC is required, the time it takes to do this appears in semaphore_wait_trap since your application thread. I suspect that the mono-profile does not profile the gc stream itself, so you do not see the time in the collection code.

Then the original result is a GC summary:

 GC summary GC resizes: 0 Max heap size: 0 Object moves: 1002691 Gen0 collections: 123, max time: 14187us, total time: 354803us, average: 2884us Gen1 collections: 3, max time: 41336us, total time: 60281us, average: 20093us

If you want your code to run faster, do not build it so often.

Understanding the actual cost of collection can be done through dtrace, since sgen has dtrace probes .

Single-threaded program profiles 15% runtime in semaphore_wait_trap

More articles: