Why read 200 billion lines at a time?
You must list them, counting several thousand lines at a time.
Even if you really need to read all 200 billion lines, you still need to use paging to break the reading into shorter requests — so if it crashes, you just keep reading where you left off.
See an efficient way to implement paging for at least one paging method using ROW_NUMBER
If you are performing data analysis, then I suspect that you are using the wrong storage (SQL Server is not really designed to handle large data sets), or you need to modify your queries so that the analysis runs on the Server using SQL.
Update: I think the last paragraph was somewhat misunderstood.
SQL Server storage is primarily intended for online transaction processing (OLTP) - efficiently querying massive datasets in massive parallel environments (for example, reading / updating one client record in a billions database, while thousands of other users do the same for other entries). Typically, the goal is to minimize the reading of data, reducing the amount of I / O required and also reducing the number of conflicts.
The analysis you are talking about is almost the exact opposite of this - the single client is actively trying to read all records in order to perform some statistical analysis.
Yes, SQL Server will handle this, but you should keep in mind that it is optimized for a completely different scenario. For example, data is read from disk on a page (8 KB) at a time, even though your statistical processing is probably based on only 2 or 3 columns. Depending on the row density and column width, you can use only a small fraction of the data stored on the 8 KB page - most of the data that SQL Server had to read and allocate memory was not even used. (Remember that SQL Server also had to block this page so that other users do not mess with the data while reading).
If you are serious about processing / analyzing massive datasets, then there are storage formats that are optimized for these kinds of things - SQL Server also has an add function on Microsoft Analysis Services , which adds additional online analytical processing (OLAP) and data mining capabilities using storage modes more suitable for this kind of processing.
Justin
source share