What is a concise, useful and efficient way to store large time series in F #?

I am currently studying F #, and I am studying its use for analyzing financial time series. Can anyone recommend a good data structure for storing time series data?

F # offers a wide selection of native types, and I'm looking for some simple combination that will provide an elegant, concise and efficient solution.

I am looking for tick storage data, which consists of millions of time-stamped records and several (~ 5-20) numeric and text data fields with possible missing values.

My first thoughts were perhaps a sequence of tuples or records, but I was wondering if anyone could kindly suggest something that worked well in the real world.

EDIT:

A few additional points for clarification:

Common operations that I will most likely require:

  • Time Search - i.e. Find the latest data point at a given time.
  • Time pooling
  • Attaches (Updates and deletions will be rare.)

I should clearly indicate that I am studying the use of F # primarily as an interactive research tool, with the ability to compile as a (really big) added bonus.

OTHER EDITING:

I also had to mention my role / use of F #, and this data is purely research, not development. The goal is that as soon as we understand the data (and what we want to do with it), we can later indicate the tools that our developers will build. For example, data warehouses, etc., in which we will begin to use our data structures, etc.

Although I am concerned that our models are computationally intensive, consume a lot of memory and may not always be encoded in a recursive manner. So we all ultimately have to request large chunks anyway.

I must also say that I have always used Matlab or R for these tasks, but now I am interested in F # because it offers high-level interactive flexibility for research, but the same code can be used in production.

I apologize for the fact that I did not give this contextual information at the beginning (this is my first question), now I see that it helps people to formulate their answers.

Thanks again to everyone who took the time to help me.

+4
source share
2 answers

The best choice of data structure depends on what operations you want to do.

The simplest can be an array of structures. This has the advantages of quick random search, good space efficiency for uncompressed views, and good locality. If there is a separation between substructures (for example, strings), then put them to make sure that they are available.

Alternatives could be seq , which can be loaded from disk on demand, a single-linked list that allows you to quickly add items or balanced binary trees, which allows you to efficiently perform operations such as pasting in random places.

+2
source

It looks like your data should be stored and queried in a relational database (where is it currently stored ?: loading millions of records with multiple fields into memory should be an expensive operation and may leave you outdated data and difficulties persisting changes). And then you can use the F # LINQ to SQL implementation (which I think you can find in the Power Pack) so that the F # expressions translate into SQL expressions.

Here is a link from Don Syme about LINQ support in the F # Power Pack: http://blogs.msdn.com/b/dsyme/archive/2009/10/23/a-quick-refresh-on-query-support-in- the-f-power-pack.aspx

+4
source

All Articles