I have 2 files that I would like to import into MS SQL. The first file is 2.2 GB, and the second is 24 GB. (if you're interested: this is a poker lookup table)
Importing them into MS SQL is not a problem. Thanks to SqlBulkCopy, I was able to import the first file in just 10 minutes. My problem is that I donβt know what the actual table layout should look like to allow me to make very fast queries. My first naive attempt looks like this:
CREATE TABLE [dbo]. [TblFlopHands] (
[hand_id] [int] IDENTITY (1,1) NOT NULL,
[flop_index] [smallint] NULL,
[hand_index] [smallint] NULL,
[hs1] [real] NULL,
[ppot1] [real] NULL,
[hs2] [real] NULL,
[ppot2] [real] NULL,
[hs3] [real] NULL,
[ppot3] [real] NULL,
[hs4] [real] NULL,
[ppot4] [real] NULL,
[hs5] [real] NULL,
[ppot5] [real] NULL,
[hs6] [real] NULL,
[ppot6] [real] NULL,
[hs7] [real] NULL,
[ppot7] [real] NULL,
[hs8] [real] NULL,
[ppot8] [real] NULL,
[hs9] [real] NULL,
[ppot9] [real] NULL,
CONSTRAINT [PK_tblFlopHands] PRIMARY KEY CLUSTERED
(
[hand_id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] The flop index is a value from 1 to 22100 (the first 3 community cards in Texas Hold'em, 52 choose 3). Each flop index has a hand_index from 1 to 1176 (49 select 2). Thus, in this table only 25 989 600 lines.
Completing the request with my "schema" above took approx. 25 seconds. After some googling, I found that the SQL server is scanning the table, which is obviously bad. I launched the "Database Engine Tuning Advisor" and recommended creating an index in the flop_index column (it makes sense). After creating the index, the required disk space for the database just doubled! (plus the LDF log file grew by 2.6 GB) But after indexing the request took only a couple of ms.
Now my question is: how do I do it right? I have never worked with such massive data, the databases that I created before were a joke.
Some notes: after importing data into MS SQL there will never be an insert or update of data, just select. So I wonder if I need a primary key?
EDIT: I provide additional information to make my question clearer:
1) I will never use hand_id. I just put it there because someone told me long ago that I should always create a primary key for each table.
2) Basically, only one request will be used:
SELECT hand_index, hs1, ppot1, hs2, ppot2, hs3, ppot3, hs4, ppot4, hs5, ppot5, hs6, ppot6, hs7, ppot7, hs8, ppot8, hs9, ppot9 WHERE flop_index = 1 ... 22100
This query will always return 1176 rows of data that I need.
EDIT2: Just to be more specific: Yes, this is static data. I have this data in a binary file. I wrote a program to request this file with the data that I need in just a few milliseconds. The reason I need this data in the database is because I want to be able to request data from different computers on my network without having to copy 25 GB to each computer.
HS stands for hand strength, it tells you about the current hand strength of your hole cards in combination with a flop or turn cards. ppot means positive potential, this is a chance that your hand will be ahead when the next community card is dealt. hs1 - 9 is the strength of the hands against 1-9 opponents. The same goes for ppot. On-the-fly ppot calculation is very intensive and takes several minutes to calculate. I want to create a poker analysis program that gives me a list of all possible combinations of hole combinations on any flop / turn with their hs / ppot.