Trivial Loop Profiling

I am trying to figure out how to use performance profiling. Here's a solution to the Sight Line problem from USACO 2013.

import Data.Array.Unboxed
import Data.List
import Data.Int

angle !a | a > 2 * pi = a - 2 * pi
angle !a | a < 0      = a + 2 * pi
angle !a              = a

tans :: Int64 -> [[Int64]] -> UArray (Int,Int) Double
tans r cs = listArray ((0,0), (length cs - 1, 1)) $ concatMap f cs where
  f :: [Int64] -> [Double]
  f [x,y] = [angle a2, angle a1] where
    phi | y == 0    = if x < 0 then pi else 0.0
        | otherwise = (fromIntegral $ signum y) * (acos $ (fromIntegral x) / d)
    d = sqrt $ fromIntegral $ x*x + y*y
    z = sqrt $ fromIntegral $ x*x + y*y - r*r
    a1 = phi + (acos $ (fromIntegral r)/d)
    a2 = phi - (acos $ (fromIntegral r)/d)

overlap !a1 !a2 !a1' !a2'
   | a1 < a2 && a1' < a2' = a1 <= a2' && a1' <= a2
   | a1 > a2 && a1' > a2' = overlap (a1 - 2*pi) a2 (a1' - 2*pi) a2'
   | a1 > a2 && a1' <= pi = overlap (a1 - 2*pi) a2 a1'          a2'
   | a1 > a2              = overlap a1 (a2 + 2*pi) a1'          a2'
   | a1 <= pi             = overlap a1          a2 (a1' - 2*pi) a2'
   | otherwise            = overlap a1          a2 a1'          (a2' + 2 * pi)

solve cows = length $ [ 1
                      | i <- [0..n]
                      , j <- [i+1..n]
                      , let a1 = cows ! (i,0)
                      , let a2 = cows ! (i,1)
                      , let a1' = cows ! (j,0)
                      , let a2' = cows ! (j,1)
                      , overlap a1 a2 a1' a2' ] where
  ((0,0),(n,1)) = bounds cows

main = do
         ls <- getContents
         let ([n, r]: cows ) = map (map read . words) $ lines ls
         print $ solve $ tans r cows

I use the dataset 5.in example from http://www.usaco.org/current/data/sight.zip and get the following profile:

$ ghc -O2 -XBangPatterns -ddump-simpl sight3.hs
$ ./sight3 < 5.in
...
    Sun Dec 01 23:35 2013 Time and Allocation Profiling Report  (Final)

       sight3.EXE +RTS -p -hd -RTS

    total time  =       10.46 secs   (10459 ticks @ 1000 us, 1 processor)
    total alloc = 1,847,301,536 bytes  (excludes profiling overheads)

COST CENTRE MODULE  %time %alloc

solve       Main     65.2   30.7
overlap     Main     14.4    0.0
solve.a2'   Main      8.9   32.5
solve.a1'   Main      8.6   32.5
main.(...)  Main      2.8    4.0


                                                            individual     inherited
COST CENTRE    MODULE                     no.     entries  %time %alloc   %time %alloc

MAIN           MAIN                        49           0    0.0    0.0   100.0  100.0
 main          Main                        99           0    0.0    0.1    99.9  100.0
  main.r       Main                       110           1    0.0    0.0     0.0    0.0
  tans         Main                       105           1    0.0    0.0     0.0    0.1
   tans.f      Main                       106       10000    0.0    0.1     0.0    0.1
    tans.f.a1  Main                       112       10000    0.0    0.0     0.0    0.0
    angle      Main                       111       20000    0.0    0.0     0.0    0.0
    tans.f.d   Main                       109       10000    0.0    0.0     0.0    0.0
    tans.f.phi Main                       108       10000    0.0    0.0     0.0    0.0
    tans.f.a2  Main                       107       10000    0.0    0.0     0.0    0.0
  solve        Main                       104           1   65.2   30.7    97.1   95.7
   overlap     Main                       117    64368980   14.4    0.0    14.4    0.0
   solve.a2'   Main                       116    49995000    8.9   32.5     8.9   32.5
   solve.a1'   Main                       115    49995000    8.6   32.5     8.6   32.5
   solve.a2    Main                       114        9999    0.0    0.0     0.0    0.0
   solve.a1    Main                       113        9999    0.0    0.0     0.0    0.0
  solve.(...)  Main                       103           1    0.0    0.0     0.0    0.0
  solve.n      Main                       102           1    0.0    0.0     0.0    0.0
  main.cows    Main                       101           1    0.0    0.0     0.0    0.0
  main.(...)   Main                       100           1    2.8    4.0     2.8    4.0
 CAF           GHC.IO.Encoding.CodePage    83           0    0.0    0.0     0.0    0.0
 CAF           GHC.IO.Handle.Internals     82           0    0.0    0.0     0.0    0.0
 CAF           Text.Read.Lex               79           0    0.1    0.0     0.1    0.0
 CAF           GHC.IO.Encoding             75           0    0.0    0.0     0.0    0.0
 CAF           GHC.Int                     71           0    0.0    0.0     0.0    0.0
 CAF           GHC.IO.Handle.FD            67           0    0.0    0.0     0.0    0.0
 CAF:main1     Main                        63           0    0.0    0.0     0.0    0.0
  main         Main                        98           1    0.0    0.0     0.0    0.0
 CAF:lvl3_r3iU Main                        59           0    0.0    0.0     0.0    0.0
  • What stands out in solve.a1 'and a2'? I thought that, being strict, it emits nothing (and the calculation is no different from solve.a1)

  • How to find out where the processor is spent? I would expect that most of the costs would be spent on overlapping, and the closed loop would be very cheap in comparison.

(for the stray reader, I add that this is a purely profiling exercise - I have a solution that is hundreds of times faster, but it is boring in terms of profiling even with regular lists)

+4
1

ghc - .. .

solve , :

solve cows = n `seq` go 0 0 1 n
  where
    (_,(n,_)) = bounds cows
    go count i j n | i > n = count
                   | j > n = go count (i+1) (i+2) n
                   | overlap (cows ! (i,0)) (cows ! (i,1)) (cows ! (j,0)) (cows ! (j,1))
                       = go (count + 1) i (j + 1) n
                   | otherwise = go count i (j + 1) n

, a1 ' a2', .

Cpu go, , , . overlap 15% .

: ( ) , :

solve !cows = n `seq` go 0 0
 where
  (_,(n,_)) = bounds cows
  go !count !i | i >= n = count
               | otherwise = go2 count i (i+1) (cows ! (i,0)) (cows ! (i,1))
  go2 !count !i !j !a1 !a2 | j > n = go count (i+1)
                           | overlap a1 a2 (cows ! (j,0)) (cows ! (j,1))
                                  = go2 (count+1) i (j+1) a1 a2
                           | otherwise = go2 count i (j+1) a1 a2
+3

All Articles