Haskell: tuning list / vector / array performance

I am trying to use Haskell to compute statistical partition functions on a model. This includes moving fairly large lists of configurations and summing up various observables - which I would like to do as efficiently as possible.

The current version of my code is here: https://gist.github.com/2420539

Some strange things happen when trying to choose between lists and vectors to list configurations; in particular, to truncate the list using V.toList . V.take (3^n) . V.fromList V.toList . V.take (3^n) . V.fromList V.toList . V.take (3^n) . V.fromList (where V is Data.Vector ) is faster than just using take , which is a little contrary to intuition. In both cases, the list is evaluated lazily.

The list itself is created using iterate ; if instead I use Vector as much as possible and create a list using V.iterateN , it will again become slower ...

My question is, is there a way (other than splicing V.toList and V.fromList at random places in the code) to predict which one will be the fastest? (BTW, I will compile everything using ghc -O2 with the current stable version.)

+7
source share
1 answer

Vectors are strict and have O (1) subsets (e.g. take). They also have optimized insertion and deletion. This way, you will sometimes see increased productivity by switching data on the fly. However, as a rule, this is the wrong approach - it is better to store all the data in one or another form. (And you also use UArrays, which confuses the problem even more).

General rules:

  • If the data is large and can only be converted in a massive way, the use of reasonable efficient structures such as vectors makes sense.

  • If the data is small and intersects linearly, rarely, then lists make sense.

Remember that operations on lists and vectors have different complexity, therefore, if iterate . replicate iterate . replicate in lists is O (n), but lazy, the same thing on vectors will not necessarily be just as efficient (you should prefer the built-in methods in a vector to generate arrays).

As a rule, vectors should always be better for numerical operations. You may need to use different functions that you do in lists.

I would stick with vectors only. Avoid UArrays and avoid lists except generators.

+12
source

All Articles