Modeling time series in f # - seq vs array vs vector vs list vs generic list

Question

Modeling time series in f # - seq vs array vs vector vs list vs generic list

If I want the time series type in F # to hold stock prices, which base type should I use? We need

Select a subset based on a time index,
Calculate basic statistics for a subset, such as average, STD, or for multiple subsets, such as correlations,
Add an item for new data and quick statistics of updates or technical indicators,
A linear regression is performed between time series, etc.

I read that array has better performance, seq has a smaller footnote, list better for adding elements, and F # vector easier for a certain mathematical calculation. To balance all the trade-offs, how would you simulate the time series of stocks in f #? Thanks.

+6

f # time-series

ahala Feb 12 '11 at 13:55

source share

1 answer

Tomas petricek · Accepted Answer · 2011-02-12T15:26:57+0000

As a specific view, you can choose either an array or a list, or some other type of .NET merge. The sequence seq<'T> is an abstract type, and both arrays and lists are automatically sequences as well - this means that when writing code that works with sequences, it will work with any specific data type (array, list, or any other collection. NET).

So, when writing data processing, you can use Seq by default (since it gives you a lot of flexibility - no matter what specific view you use), and then optimize some operations for use a specific view (no matter what), if you something needs to be accelerated.

As for the specific view - I think the main question is whether you want to add elements without changing the original data structure (immutable list or array used in an immutable way) or whether you want to mutate data structures (for example, use some mutable assembly. NET).

If you need to add new elements freuqently, you can either use an immutable list (which supports attaching elements to the front) or a mutable collection (the array will not do since it cannot be changed).

If you are working on a more complex system, I would recommend taking a look at the ObservableCollection<T> (see MSDN ). This is a collection that automatically notifies you of a change. In response to the notification, you can update your statistics (it also tells you which items have been added, so you do not need to recalculate everything). However, F # has no libraries to work with this type, so you need to write a lot of things yourself.
If you rarely add data or add it to larger groups, you can use an array (and allocate a new array every time you add elements). If you only have a relatively small number of items in the collection, you can use lists (where adding an item is easy).

For numerical calculations, F # PowerPack (and types similar to a vector) offer only a rather limited set of functions, so you may have to look at some party libraries. Extreme Optimization is a commercial library with some examples of F # and Math.NET - an open source alternative.

Otherwise, it is difficult to give any specific advice - can you add more detailed information about your system? (for example, how large is the data set, how many elements need to be added, how often, etc.)

Modeling time series in f # - seq vs array vs vector vs list vs generic list

More articles: