Database modeling for stock prices

Recently, I was given the task of modeling a database suitable for stock prices for more than 140 companies. Data will be collected every 15 minutes for 8.5 hours every day from all of these companies. The problem, right now, is how to set up a database for quick search / retrieval given this data.

One solution would be to store everything in one table with the following columns:

| Company name | Price | Date | Etc... | 

Or I could create a table for each company and simply save the price and date for when the data was collected (and other unknown atm parameters).

What do you think of these decisions? I hope the problem has been explained in sufficient detail, otherwise, please let me know.

It would be helpful to get any other solution!

+4
source share
5 answers

I think you are concerned about performance, given the large number of records that you can generate - 140 companies * 4 data points / hour * 8.5 hours * 250 trading days / year means that you view about 1.2 million data points in year.

Modern relational database systems can easily handle this number of records - taking into account some important considerations - in one table - I do not see a problem with storing 100-year-old data points.

So yes, your initial design is probably the best:

Company Name | Price | Date | Etc ... |

Create indexes for company name and date; that will allow you to answer questions such as:

  • What was the highest company stock price x
  • what was the stock price of company x at date y
  • on date y, which was the highest share price

To prevent performance issues, I would create a test database and populate it with sample data (tools like dbMonster make it easy), and then build queries that you (you think) will work against a real system; Use your database configuration tools to optimize these queries and / or indexes.

+2
source

The first, more important question is the types and patterns of query usage that will be performed on this table. Is this an online transaction application (OLTP) where the vast majority of requests relate to a single record or at least a small set of records? or this is an application for online analytical processing, where most of the queries will need to be read and processed, significantly larger data sets for generating aggregations and conducting analysis. These two very different types of systems must be modeled differently.

If this is the first application type (OLTP), your first option is the best, but usage patterns and query types will continue to be important in determining the types of indexes you want to place in the table.

If this is an OLAP application (and the system that stores billions of stock prices is more like an OLAP application), then the data structure you set up can be better organized to store pre-aggregated data values ​​or even use a multidimensional database for everyone. for example, an OLAP cube based on a star scheme .

+3
source

Put them in one table. Modern database engines can easily process the volumes you specify.

rowid | Stockcode | price TimeInUTC | PriceCode | AskPrice | Bidprice | Volume

  • rowid: Identity UniqueIdentifier.
  • StockCode instead of Company. Companies have several types of socks.
  • PriceTimeInUTC - standardize any time in a specific time zone.
  • Also datetime2 (more accurately).
  • PriceCode is used to determine what price: Options / Futures / CommonStock, PreferredStock, etc.
  • AskPrice is the purchase price
  • BidPrice is the selling price.
  • Volume (for buying / selling) may be useful to you.

Separately use the StockCode table and the PriceCode table.

+3
source

In addition to what has already been said, I would like to say the following: do not use the "company name" or something like "Symbol Ticker" as the main key. As you are likely to learn, stock prices have two important characteristics that are often ignored:

  • some companies may be quoted on several exchanges and therefore have different quotation prices on each exchange.
  • some companies are listed several times on the same stock exchange, but in different currencies.

As a result, the correct general solution is to use a triplet (ISIN, currency, stock exchange) as an identifier for a quote.

+3
source

This is a brute force approach. Secondly, you add search factors that can change everything. A more flexible and elegant option is a stellar design that can scale to any amount of data. I am a private party working on it myself.

0
source

All Articles