Haskell Stack Overflow

I am writing a genetic algorithm to create the string "helloworld". But the evolve function creates a stack overflow when n is 10,000 or more.

module Genetics where import Data.List (sortBy) import Random (randomRIO) import Control.Monad (foldM) class Gene g where -- How ideal is the gene from 0.0 to 1.0? fitness :: g -> Float -- How does a gene mutate? mutate :: g -> IO g -- How many species will be explored? species :: [g] -> Int orderFitness :: (Gene g) => [g] -> [g] orderFitness = reverse . sortBy (\ab -> compare (fitness a) (fitness b)) compete :: (Gene g) => [g] -> IO [g] compete pool = do let s = species pool variants <- (mapM (mapM mutate) . map (replicate s)) pool let pool' = (map head . map orderFitness) variants return pool' evolve :: (Gene g) => Int -> [g] -> IO [g] evolve 0 pool = return pool evolve n pool = do pool' <- compete pool evolve (n - 1) pool' 

With species pool = 8 pool of 8 genes is replicated in 8 groups. Each group mutates, and the most suitable of each group is selected for further evolution (back to 8 genes).

Github

+7
source share
3 answers

Thanks to Don deepseq I was able to narrow down the issue to mapM mutate , which made too many thunks. The new version has mutate' , which uses seq to prevent thunking.

 module Genetics where import Data.List (maximumBy) import Random (randomRIO) class Gene g where -- How ideal is the gene from 0.0 to 1.0? fitness :: g -> Float -- How does a gene mutate? mutate :: g -> IO g -- How many species will be explored in each round? species :: [g] -> Int best :: (Gene g) => [g] -> g best = maximumBy (\ab -> compare (fitness a) (fitness b)) -- Prevents stack overflow mutate' :: (Gene g) => g -> IO g mutate' gene = do gene' <- mutate gene gene' `seq` return gene' drift :: (Gene g) => [[g]] -> IO [[g]] drift = mapM (mapM mutate') compete :: (Gene g) => [g] -> IO [g] compete pool = do let islands = map (replicate (species pool)) pool islands' <- drift islands let representatives = map best islands' return representatives evolve :: (Gene g) => Int -> [g] -> IO [g] evolve 0 pool = return pool evolve n pool = compete pool >>= evolve (n - 1) 

Github

+2
source

If you are interested in performance, I would use a fast random number generator, for example:

Secondly, compete looks very suspicious, because he is completely lazy, despite the creation of some potentially large structures. Try rewriting it to be a little more rigorous using deepseq hammer:

 import Control.DeepSeq compete :: (Gene g, NFData g) => [g] -> IO [g] compete pool = do let s = species pool variants <- (mapM (mapM mutate) . map (replicate s)) pool let pool' = (map head . map orderFitness) variants pool' `deepseq` return pool' 

None of this is needed, however, in IO (separate release). Something like Rand Monad might be more appropriate .

+3
source

Instead of (map head . map orderFitness) , where orderFitness is sortBy , you can use maximumBy and one map . This will not save too much (since you are switching from O (n log n) to O (n) and you can get a different coefficient of the two from eliminating the double card), but at least it is somewhat simpler and more efficient, you will also get rid of the call to reverse.

I doubt this fixes the problem without deepseq , but it should be improved nonetheless.

Edit: if the standard library and GHC were perfect, then head . sortBy head . sortBy would generate the identical code maximumBy and map head . map sortBy map head . map sortBy identical map (head . sortBy) code map (head . sortBy) , unfortunately, none of these things is likely to be true in practice, sortBy will tend to do a bunch of additional memory allocation, because it is a partition and conquer algorithm. Card merging is an optimization that you sometimes get, but don't count.

More importantly, using maximumBy more declarative. It’s easier to see what the code does and how long it takes. It should also be easier to take advantage of the optimization, because we know what the goal is, and not just how we get it.

+1
source

All Articles