Perlin Noise Optimization in Haskell

(The dependencies for this program are: vector --any and JuicyPixels >= 2 The code is available as a Gist .)

 {-# LANGUAGE Haskell2010 #-} {-# LANGUAGE BangPatterns #-} import Control.Arrow import Data.Bits import Data.Vector.Unboxed ((!)) import Data.Word import System.Environment (getArgs) import qualified Codec.Picture as P import qualified Data.ByteString as B import qualified Data.Vector.Unboxed as V 

I tried connecting Ken Perlin improved noise to Haskell, but I'm not quite sure my method is correct. The main part is what should be well divided into higher and lower sizes, but this is something for the later:

 perlin3 :: (Ord a, Num a, RealFrac a, V.Unbox a) => Permutation -> (a, a, a) -> a perlin3 p (!x', !y', !z') = let (!xX, !x) = actuallyProperFraction x' (!yY, !y) = actuallyProperFraction y' (!zZ, !z) = actuallyProperFraction z' !u = fade x !v = fade y !w = fade z !h = xX !a = next ph + yY !b = next p (h+1) + yY !aa = next pa + zZ !ab = next p (a+1) + zZ !ba = next pb + zZ !bb = next p (b+1) + zZ !aaa = next p aa !aab = next p (aa+1) !aba = next p ab !abb = next p (ab+1) !baa = next p ba !bab = next p (ba+1) !bba = next p bb !bbb = next p (bb+1) in lerp w (lerp v (lerp u (grad aaa (x, y, z)) (grad baa (x-1, y, z))) (lerp u (grad aba (x, y-1, z)) (grad bba (x-1, y-1, z)))) (lerp v (lerp u (grad aab (x, y, z-1)) (grad bab (x-1, y, z-1))) (lerp u (grad abb (x, y-1, z-1)) (grad bbb (x-1, y-1, z-1)))) 

This, of course, is accompanied by several functions mentioned in perlin3 functions, of which I hope they will be as efficient as possible:

 fade :: (Ord a, Num a) => a -> a fade !t | 0 <= t, t <= 1 = t * t * t * (t * (t * 6 - 15) + 10) lerp :: (Ord a, Num a) => a -> a -> a -> a lerp !t !a !b | 0 <= t, t <= 1 = a + t * (b - a) grad :: (Bits hash, Integral hash, Num a, V.Unbox a) => hash -> (a, a, a) -> a grad !hash (!x, !y, !z) = dot3 (vks `V.unsafeIndex` fromIntegral (hash .&. 15)) (x, y, z) where vks = V.fromList [ (1,1,0), (-1,1,0), (1,-1,0), (-1,-1,0) , (1,0,1), (-1,0,1), (1,0,-1), (-1,0,-1) , (0,1,1), (0,-1,1), (0,1,-1), (0,-1,-1) , (1,1,0), (-1,1,0), (0,-1,1), (0,-1,-1) ] dot3 :: Num a => (a, a, a) -> (a, a, a) -> a dot3 (!x0, !y0, !z0) (!x1, !y1, !z1) = x0 * x1 + y0 * y1 + z0 * z1 -- Unlike `properFraction`, `actuallyProperFraction` rounds as intended. actuallyProperFraction :: (RealFrac a, Integral b) => a -> (b, a) actuallyProperFraction x = let (ipart, fpart) = properFraction x r = if x >= 0 then (ipart, fpart) else (ipart-1, 1+fpart) in r 

For the permutation group, I just copied the one that Perlin used on its website:

 newtype Permutation = Permutation (V.Vector Word8) mkPermutation :: [Word8] -> Permutation mkPermutation xs | length xs >= 256 = Permutation . V.fromList $ xs permutation :: Permutation permutation = mkPermutation [151,160,137,91,90,15, 131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23, 190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33, 88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166, 77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244, 102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196, 135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123, 5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42, 223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9, 129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228, 251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107, 49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254, 138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180 ] next :: Permutation -> Word8 -> Word8 next (Permutation !v) !idx' = v `V.unsafeIndex` (fromIntegral $ idx' .&. 0xFF) 

And all this has to do with JuicyPixels:

 main = do [target] <- getArgs let image = P.generateImage pixelRenderer 512 512 P.writePng target image where pixelRenderer, pixelRenderer' :: Int -> Int -> Word8 pixelRenderer !x !y = floor $ ((perlin3 permutation ((fromIntegral x - 256) / 32, (fromIntegral y - 256) / 32, 0 :: Double))+1)/2 * 128 -- This code is much more readable, but also much slower. pixelRenderer' xy = (\w -> floor $ ((w+1)/2 * 128)) -- w should be in [-1,+1] . perlin3 permutation . (\(x,y,z) -> ((x-256)/32, (y-256)/32, (z-256)/32)) $ (fromIntegral x, fromIntegral y, 0 :: Double) 

My problem is that perlin3 seems to me very slow. If I project it, pixelRenderer also a lot of time, but this time I will ignore it. I do not know how to optimize perlin3 . I tried to hint the GHC to the percussion patterns that cut the runtime in half, so nice. Explicitly specialized and inlaid hardly helps with ghc -O . Is perlin3 so slow?


UPDATE : An earlier version of this question mentioned an error in my code. This issue has been solved; Turns out my old version of actuallyProperFraction was a bug. He implicitly rounded the integral part of the floating point number to Word8 , and then subtracted it from the floating point number to get the fractional part. Since Word8 can only take values ​​between 0 and 255 inclusive, this will not work properly for numbers outside this range, including negative numbers.

+6
source share
2 answers

This code is mainly related to computing. It can be improved a little, but not much if there is no way to use fewer array requests and less arithmetic.

There are two useful tools for measuring performance: profiling and code reset. I added the SCC annotation to perlin3 so that it appears in the profile. Then I compiled gcc -O2 -fforce-recomp -ddump-simpl -prof -auto . The -ddump-simpl prints simplified code.

Profiling. On my computer, it takes 0.60 seconds to start the program, and about 20% of the execution time (0.12 seconds) is spent in perlin3 according to the profile. Please note that the accuracy of my profile information is about +/- 3%.

Simplification of output: Simplification creates fairly clean code. perlin3 inserted into the pixelRenderer , so part of the output you want to see. Most of the code consists of unpacked arrays and unboxed arithmetic. To improve performance, we want to eliminate some of these arithmetic operations.

The easy change is to exclude runtime checks on SomeFraction (which does not appear in your question, but is part of the code you downloaded). This reduces program execution time to 0.56 seconds.

 -- someFraction t | 0 <= t, t < 1 = SomeFraction t someFraction t = SomeFraction t 

Further, there are several array searches that appear in simplification, for example:

  case GHC.Prim.indexWord8Array# ipv3_s23a (GHC.Prim.+# ipv1_s21N (GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.narrow8Word# (GHC.Prim.plusWord# ipv5_s256 (__word 1))) (__word 255)))) 

The initial operation of narrow8Word# is to force Int to a Word8 . We can get rid of this compulsion by using Int instead of Word8 in the definition of next .

 next :: Permutation -> Int -> Int next (Permutation !v) !idx' = fromIntegral $ v `V.unsafeIndex` (fromIntegral idx' .&. 0xFF) 

This reduces program execution time to 0.54 seconds. Given only the time spent in perlin3 , runtime dropped (approximately) from 0.12 to 0.06 seconds. Although it is difficult to measure where the rest of the time is, it is most likely distributed among the remaining arithmetic and massive accesses.

+4
source

On my machine, the link code with Heatsink optimization takes 0.19 seconds.

Firstly, I moved from JuicyPixels to yarr and yarr-image-io with my favorite flags, -Odph -rtsopts -threaded -fno-liberate-case -funbox-strict-fields -fexpose-all-unfoldings -funfolding-keeness-factor1000 -fsimpl-tick-factor=500 -fllvm -optlo-O3 (assigned to them here ):

 import Data.Yarr as Y import Data.Yarr.IO.Image as Y ... main = do [target] <- getArgs image <- dComputeS $ fromFunction (512, 512) (return . pixelRenderer) Y.writeImage target (Grey image) where pixelRenderer, pixelRenderer' :: Dim2 -> Word8 pixelRenderer (y, x) = floor $ ((perlin3 permutation ((fromIntegral x - 256) / 32, (fromIntegral y - 256) / 32, 0 :: Double))+1)/2 * 128 -- This code is much more readable, but also much slower. pixelRenderer' (y, x) = (\w -> floor $ ((w+1)/2 * 128)) -- w should be in [-1,+1] . perlin3 permutation . (\(x,y,z) -> ((x-256)/32, (y-256)/32, (z-256)/32)) $ (fromIntegral x, fromIntegral y, 0 :: Double) 

This makes the program 30% faster, 0.13 seconds.

Secondly, I replaced using the standard floor with

 doubleToByte :: Double -> Word8 doubleToByte f = fromIntegral (truncate f :: Int) 

Known issue (google "haskell floor performance"). The execution time is reduced to 52 ms (0.052 sec.) By almost 3 times.

Finally, just for fun, I tried to compute the noise in parallel ( dComputeP instead of dComputeS and +RTS -N4 on the command line). The program took 36 ms, including a constant input / output of about 10 ms.

+2
source

All Articles