Shorten the list on the fly in Haskell

Suppose I have a function f that takes some input and produces a number. Inside the function f , a list is created by the input, which is then reduced (for example, using foldl' g ) to obtain the final output number. Since in the end the intermediate list must be reduced, is it possible to apply the reduction function g without expressing the intermediate list. The goal here is to limit the memory used for storage (or expression if you "save" a less accurate word).

To illustrate this, this foldPairProduct function takes up O(N1 * N2) space for an intermediate list (space consumed can be more complex due to expression and lazy rating, but I assume it is proportional or worse). Here N1, N2 is the size of two input lists.

 foldPairProduct :: (Num a, Ord a) => (a -> a -> a) -> [a] -> [a] -> a foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys] 

An alternative implementation of the logic is the foldPairProduct command, which occupies the O(2 * 2) space.

 foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a foldPairProduct' _ _ [] = Nothing foldPairProduct' _ [] _ = Nothing foldPairProduct' f (x:xs) (y:ys) = foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y], foldPairProduct' f xs ys] 

The situation is exacerbated for foldCrossProduct , whose implementation is similar to foldPairProduct , except that it accepts multiple lists as input. Its spatial complexity (still in the sense of imperative languages) for the intermediate list is O(N1 * N2 * ...* Nk) , where k is the length [[a]] .

 foldCrossProduct :: Num a => (a -> a -> a) -> [[a]] -> a foldCrossProduct f xss = foldl1 f (crossProduct xss) crossProduct :: Num a => [[a]] -> [a] crossProduct [] = [] crossProduct (xs:[]) = xs crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss] 

If we followed the idea of ​​implementing foldPairProduct' , then the complexity of the space would be k^2 , which is much more efficient in terms of space. My questions:

  • I have implemented foldPairProduct' for a couple of lists. However, it seems that implementing it for an arbitrary number of lists is not easy.

  • I do not want to compare Haskell with an imperative language, but is there an implementation that will use constant space (or in another word without expressing an intermediate list of the specified length)? Maybe Monad would help, but I'm not very familiar with this.

  • Does the compiler really do its magic? That is, he notices that the list is intermediate and needs to be reduced, and really figure out how to efficiently calculate it. In the end, this is what I was thinking about lazy evaluating and optimizing the compiler.

  • Any comments are welcome. Thank you

Update 1

The performance test confirmed the analysis of the "spatial complexity" of foldPairProduct and foldCrossProduct based on a change in input size N1, N2, N3 and observation of bytes copied by GC.

The performance test did not confirm the foldPairProduct' analysis, which unexpectedly showed N1 * N2 or even worse use of space. This is likely due to the fact that the recursive call is inefficiently evaluated. The results are attached below (with ghc settings the same as Yuras).

Update 2

Another experiment is updated when I learn from comments and answers. For foldPairProduct total memory used corresponds to the spatial complexity explained by Daniel Fisher.

For foldCrossProduct , although Daniel complexity analysis makes sense to me, the results do not show linear memory usage. Following Daniel's advice, x <- xs and y <- crossproduct ys , and this really led to the complexity of the linear space.

For foldCrossProduct (max) [[1..100],[1..n], [1..1000]] , with n = 100, 1000, 10000, 100000, the used memory is 2, 2, 3, 14 MB .

foldPairProduct [1..n] [1..10000]

 n = 100 120,883,320 bytes allocated in the heap 56,867,728 bytes copied during GC 428,384 bytes maximum residency (50 sample(s)) 98,664 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 1000 1,200,999,280 bytes allocated in the heap 569,837,360 bytes copied during GC 428,384 bytes maximum residency (500 sample(s)) 99,744 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 10000 12,002,152,040 bytes allocated in the heap 5,699,468,024 bytes copied during GC 428,384 bytes maximum residency (5000 sample(s)) 99,928 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 100000 120,013,672,800 bytes allocated in the heap 56,997,625,608 bytes copied during GC 428,384 bytes maximum residency (50000 sample(s)) 99,984 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) 

foldPairProduct [1..10000] [1..n]

 n = 100 121,438,536 bytes allocated in the heap 55,920 bytes copied during GC 32,408 bytes maximum residency (1 sample(s)) 19,856 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) n = 1000 1,201,511,296 bytes allocated in the heap 491,864 bytes copied during GC 68,392 bytes maximum residency (1 sample(s)) 20,696 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) n = 10000 12,002,232,056 bytes allocated in the heap 5,712,004,584 bytes copied during GC 428,408 bytes maximum residency (5000 sample(s)) 98,688 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 100000 120,009,432,816 bytes allocated in the heap 81,694,557,064 bytes copied during GC 4,028,408 bytes maximum residency (10002 sample(s)) 769,720 bytes maximum slop 14 MB total memory in use (0 MB lost due to fragmentation) 

foldPairProduct [1..n] [1..n]

 n = 100 1,284,024 bytes allocated in the heap 15,440 bytes copied during GC 32,336 bytes maximum residency (1 sample(s)) 19,920 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) n = 1000 120,207,224 bytes allocated in the heap 114,848 bytes copied during GC 68,336 bytes maximum residency (1 sample(s)) 24,832 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) n = 10000 12,001,432,024 bytes allocated in the heap 5,708,472,592 bytes copied during GC 428,336 bytes maximum residency (5000 sample(s)) 99,960 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 100000 1,200,013,672,824 bytes allocated in the heap 816,574,713,664 bytes copied during GC 4,028,336 bytes maximum residency (100002 sample(s)) 770,264 bytes maximum slop 14 MB total memory in use (0 MB lost due to fragmentation) 

foldCrossProduct (max) [[1..n], [1..100], [1..1000]]

 n = 100 105,131,320 bytes allocated in the heap 38,697,432 bytes copied during GC 427,832 bytes maximum residency (34 sample(s)) 209,312 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 1000 1,041,254,480 bytes allocated in the heap 374,148,224 bytes copied during GC 427,832 bytes maximum residency (334 sample(s)) 211,936 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 10000 10,402,479,240 bytes allocated in the heap 3,728,429,728 bytes copied during GC 427,832 bytes maximum residency (3334 sample(s)) 215,936 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) 

foldCrossProduct (max) [[1..100], [1..n], [1..1000]]

 n = 100 105,131,344 bytes allocated in the heap 38,686,648 bytes copied during GC 431,408 bytes maximum residency (34 sample(s)) 205,456 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) n = 1000 1,050,614,504 bytes allocated in the heap 412,084,688 bytes copied during GC 4,031,456 bytes maximum residency (53 sample(s)) 1,403,976 bytes maximum slop 15 MB total memory in use (0 MB lost due to fragmentation) n = 10000 quit after over 1362 MB total memory in use (0 MB lost due to fragmentation) 

foldPairProduct '[1..n] [1..n]

 n = 100 4,351,176 bytes allocated in the heap 59,432 bytes copied during GC 74,296 bytes maximum residency (1 sample(s)) 21,320 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) n = 1000 527,009,960 bytes allocated in the heap 45,827,176 bytes copied during GC 211,680 bytes maximum residency (1 sample(s)) 25,760 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) 
+8
list haskell reduction on-the-fly
source share
3 answers
 foldPairProduct :: (Num a, Ord a) => (a -> a -> a) -> [a] -> [a] -> a foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys] 

may be a good citizen. The second argument, ys is reused, so it must be completely in memory during the calculation, but the intermediate list is lazily created as it is consumed, so it only contributes a constant amount of memory, giving the total O(length ys) space complexity. Of course, there must be length xs * length ys list cells that produce and consume, so the general distribution is O(length xs * length ys) [it is assumed that each value of a uses limited space]. The number of bytes copied during the GC (and therefore the time required for the GC) can be drastically reduced by providing a larger allocation area with +RTS -A1M , the number drops from

 3,717,333,376 bytes copied during GC 

for default setting

 20,445,728 bytes copied during GC 

and time from GC time 4.88s to GC time 0.07s for xs == ys = [1 .. 10000] :: [Int] and f = (+) .

But it depends on the strictness analyzer doing its job - what it does well if the type that it used, for example, Int , is known at compile time, and the union function is known to be strict. If the code is not specialized or if the union function is not known to be strict, then flushing will be O(length xs * length ys) . This problem can be fixed with a more stringent foldl1' .

 foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a foldPairProduct' _ _ [] = Nothing foldPairProduct' _ [] _ = Nothing foldPairProduct' f (x:xs) (y:ys) = foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y], foldPairProduct' f xs ys] 

runs into a problem of insufficient rigor, the value wrapped by the Just constructor cannot be made by a strict compiler here, since it may not be needed for a general result, so folding often produces a O(length xs * length ys) size thunk under Just - of course, for some f , like const , it will behave as it is. In order to be a good citizen of memory, if all values ​​are used, you should use a rather strict union function f , forcing the result into Just as a result (if it is a Just ); using foldl1' also helps. At the same time, it can have O(length ys + length xs) space complexity (the lists xs and ys are used more than once, therefore they are reused).

 foldCrossProduct :: Num a => (a -> a -> a) -> [[a]] -> a foldCrossProduct f xss = foldl1 f (crossProduct xss) crossProduct :: Num a => [[a]] -> [a] crossProduct [] = [] crossProduct (xs:[]) = xs crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss] 

Although the GHC does a little CSE (general subexpression exception), the crossProduct xss list will (possibly) be distributed here between different x s, so it creates O(N2*...*Nk) space complexity. If the order of the items in the list does not matter, reorder to

 crossProduct (xs:xss) = [x * y | y <- crossProduct xss, x <- xs] 

helps. Then crossProduct xss does not have to be in memory immediately, so it can be increased and consumed, but only xs should be remembered because it has been used several times. For recursive calls, the first of the remaining lists must be shared, so this will create the overall complexity of the O(N1+...+Nk-1) space.

+2
source share

(Well, I was mistaken, it will not work in constant space, because one of the lists is used several times, so it will most likely have linear spatial complexity)

Have you tried to compile a test program with optimizations enabled? Your foldPairProduct looks good to me and I expect it to work in a constant space.

ADD: Yes, it works in constant space (3 MB shared memory):

 shum@shum-laptop:/tmp/shum$ cat test.hs foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys] n :: Int n = 10000 main = print $ foldPairProduct (+) [1..n] [1..n] shum@shum-laptop:/tmp/shum$ ghc --make -fforce-recomp -O test.hs [1 of 1] Compiling Main ( test.hs, test.o ) Linking test ... shum@shum-laptop:/tmp/shum$ time ./test +RTS -s 2500500025000000 10,401,332,232 bytes allocated in the heap 3,717,333,376 bytes copied during GC 428,280 bytes maximum residency (3335 sample(s)) 219,792 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 16699 colls, 0 par 4.27s 4.40s 0.0003s 0.0009s Gen 1 3335 colls, 0 par 1.52s 1.52s 0.0005s 0.0012s INIT time 0.00s ( 0.00s elapsed) MUT time 2.23s ( 2.17s elapsed) GC time 5.79s ( 5.91s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 8.02s ( 8.08s elapsed) %GC time 72.2% (73.2% elapsed) Alloc rate 4,659,775,665 bytes per MUT second Productivity 27.8% of total user, 27.6% of total elapsed real 0m8.085s user 0m8.025s sys 0m0.040s shum@shum-laptop:/tmp/shum$ 
+4
source share

There is some optimization for creating / modifying / consuming lists called loop fusion . Because Haskell is clean and non-strict, there are a number of laws, such as map f . mag g == map (f . g) map f . mag g == map (f . g) .

If for some reason the compiler does not recognize the code and does not generate suboptimal code (after passing the -O flag), I will examine in detail the process of thread merging to find out what prevents it.

+4
source share

All Articles