Haskell: tuning list / vector / array performance

Question

Haskell: tuning list / vector / array performance

I am trying to use Haskell to compute statistical partition functions on a model. This includes moving fairly large lists of configurations and summing up various observables - which I would like to do as efficiently as possible.

The current version of my code is here: https://gist.github.com/2420539

Some strange things happen when trying to choose between lists and vectors to list configurations; in particular, to truncate the list using V.toList . V.take (3^n) . V.fromList V.toList . V.take (3^n) . V.fromList V.toList . V.take (3^n) . V.fromList (where V is Data.Vector ) is faster than just using take , which is a little contrary to intuition. In both cases, the list is evaluated lazily.

The list itself is created using iterate ; if instead I use Vector as much as possible and create a list using V.iterateN , it will again become slower ...

My question is, is there a way (other than splicing V.toList and V.fromList at random places in the code) to predict which one will be the fastest? (BTW, I will compile everything using ghc -O2 with the current stable version.)

+7

optimization data-structures vector haskell

Vincent beffara Apr 19 '12 at 12:19

source share

1 answer

Don stewart · Accepted Answer · 2012-04-19T12:24:45+0000

Vectors are strict and have O (1) subsets (e.g. take). They also have optimized insertion and deletion. This way, you will sometimes see increased productivity by switching data on the fly. However, as a rule, this is the wrong approach - it is better to store all the data in one or another form. (And you also use UArrays, which confuses the problem even more).

General rules:

If the data is large and can only be converted in a massive way, the use of reasonable efficient structures such as vectors makes sense.
If the data is small and intersects linearly, rarely, then lists make sense.

Remember that operations on lists and vectors have different complexity, therefore, if iterate . replicate iterate . replicate in lists is O (n), but lazy, the same thing on vectors will not necessarily be just as efficient (you should prefer the built-in methods in a vector to generate arrays).

As a rule, vectors should always be better for numerical operations. You may need to use different functions that you do in lists.

I would stick with vectors only. Avoid UArrays and avoid lists except generators.

Haskell: tuning list / vector / array performance

More articles: