Data.Map: how can I determine if I need "maps with a string of values"?

When choosing between Data.Map.Lazy and Data.Map.Strict documents tell us about the first:

The API of this module is strict in keys, but lazy in values. If you need maps with a string of values, use Data.Map.Strict instead.

and for the latter, similarly:

The API of this module is strict for both keys and values. If you need cards with a face value, use Data.Map.Lazy instead.

How do more experienced Haskellers than me tend to understand this “need”? Example use-in-example in the run-and-done command-line tool (i.e. not as a daemon / long run): readFile built-in user configuration file for lines , where many (not all) lines define key: value pairs, which must be compiled into a Map . After that, we rewrite a lot of values ​​in it, depending on the other values ​​in it that were read later (due to the immutability in this process, we create a new Map and discard the original incarnation).

(Although in practice this file will probably not often or never even reach 1000 lines, let me just assume, in order to find out that for some users it will be a long time.)

Any given launch of the tool will probably look for about 20-100% (rewritten at boot, although with lazy-eval I’m never sure "when really") key: pairs of values, somewhere between one and several dozen times.

How can I reason about the differences between "value-strict" and "value-lazy" Data.Map here? What happens "under the hood" when it comes to mass computing?

In principle, such hash cards, of course, are associated with “storing once, looking many times”, but then, which is not in the calculations, “in principle”. And besides, the whole concept of lazy-Eval tricks seems to come down to this very principle, so why not always remain valuable-lazy?

+8
haskell
source share
2 answers

How can I reason about the differences between "value-strict" and "value-lazy" Data.Maps here?

Lazy is normal in Haskell. This means that not only values ​​are stored, but also tricks (i.e., recipes for how to calculate the value). For example, let's say you extract a value from a string as follows:

 tail (dropUntil (==':') line) 

Then the value-strict card will actually extract the value when inserted, while the lazy person will happily remember how to get it. This is also what you would get on a lookup

Here are some pros and cons:

  • For lazy values, more memory may be required, not only for thunk itself, but also for the data that is indicated there (here line ).
  • strict values ​​may require more memory. In our case, this may be the case when the string is interpreted to get some hungry memory structure, such as lists, JSON, or XML.
  • Using lazy values ​​may require less CPU if your code doesn't need every value.
  • thunks that are too deeply nested can overflow the stack when the value is finally needed.
  • There is also a semantic difference: in lazy mode, you can leave when the code to retrieve the value fails (like the one above, if there is no ":" in the line), if you just need to see if the key is present. In strict mode, your program crashes after insertion.

As always, there are no fixed measures, such as: "If your estimated value requires less than 20 bytes and takes less than 30 microseconds to calculate, use strict, otherwise use lazy."

Typically, you just go with one, and when you notice extreme battery life / memory usage, you try another.

+8
source share

Here's a little experiment that shows the difference between Data.Map.Lazy and Data.Map.Strict . This code runs out of heaps:

 import Data.Foldable import qualified Data.Map.Lazy as M main :: IO () main = print $ foldl' (\kv i -> M.adjust (+i) 'a' kv) (M.fromList [('a',0)]) (cycle [0]) 

(Better to compile with a small maximum heap, for example ghc Main.hs -with-rtsopts="-M20m" .)

foldl' stores the map in WHNF when we iterate over an infinite list of zeros. However, thunks accumulate in a modified value until the heap is exhausted.

The same code with Data.Map.Strict just loops forever. In the strict version, the values ​​are in WHNF whenever the map is in WHNF.

+3
source share

All Articles