Our underwriting company just sent us an extract of data from potential customers. There are 9 million lines. Lines consist of LeadID (guid), RawLeadXML (xml - possibly a maximum of 3-4kb) and LeadStatusID (int).
At first I tried to add an autodial integer and make it the primary key of this table. Well, he added the field, but could not make it the primary key (there is not enough memory in the buffer pool).
What I need to do is take each entry, 1 by 1, and get the XML, put it in the XmlDocument object in .Net, cross off all the fields that I want (name, surname, etc.), and save it parsed information to another table.
I can’t even run this statement: select * from Leads, where id is from 1 to 1000
If I just select 1000 records at a time (select the top 1000 * from potential customers), which works, but how would I then select the next 1000 records without any reference point?
My machine has 4 processors (2.53 GHz) and 12 GB of RAM. This is not a server, but it is a muscular machine. I don’t know what to do next, honestly.
EDIT . I did not notice that the source file was an MDF file (and its associated LDF), so I just connected to them in SQL Server.
EDIT 2 : I messed up and said that the RawLeadXML column was XML - it is not, it is just nvarchar (max). I honestly did not know that there is an XML data type.
EDIT 3 : I can't even issue the delete statement in this table: "delete from rows where leadid = '100a7927-5311-4f12-8fe3-95c079d32dd4' 'explodes:
Msg 802, Level 17, State 20, Line 2 There is insufficient memory available in the buffer pool.
I have no idea what to do next. How the hell is this even a problem? There are thousands of databases in the world with more records than mine.
EDIT 4 . In case someone cares, none of the solutions listed below worked. I am sure this is a limitation of my car, and definitely not condemning the excellent answers that I received below. Currently, I am transferring zipped DB (2.6 GB) to our servers in Rackspace, and then I will try to add an index to this equipment, hopefully without removing our production servers. Once the index is added, I hope that I can fix the db and return it to my local machine, and then I can actually do something with it.
EDIT 5 . My machine was literally unable to process a table of this size. My machine has 12 GB of RAM, 64-bit Windows 7 Professional, a 2.53 GHz quad-core processor, an SSD drive, etc. This is pretty tricky for a development machine. And he could not handle it.
So, I moved the database to our server in Rackspace in London. 48 GB or memory in it, and he was able to add the index I needed. Even after that, my machine here was not able to do anything useful with this, so I wrote a .NET program that runs in London to print 1000 records at a time, parse them in another table, and then mark the source records as processed.
As soon as I do this, I will have to leave the database in London because I doubt that I will be able to write any meaningful messages against this monster at the local level. This will contribute to the development of entertainment.
Summary I believe that there is no good way to process a data set of this size without using server class hardware with at least 48 GB of memory (in my case).