"cursor as" reading inside a CLR procedure / function

Question

"cursor as" reading inside a CLR procedure / function

I need to implement a data algorithm, which (for good reason) is stored inside an SQL server. The algorithm does not fit SQL very well, so I would like to implement it as a CLR function or procedure. Here is what I want to do:

Run a few queries (usually 20-50, but up to 100-200) that take the form select a,b,... from some_table order by xyz . There is an index that is suitable for this query, so the result should be available more or less without any calculations.
Consume the results step by step. The exact step depends on the results, so it is not accurately predictable.
Compare the result by clicking on the results. I will use only the first parts of the results, but I cannot predict how much I will need. The stopping criterion depends on a certain threshold inside the algorithm.

My idea was to open multiple SqlDataReader, but I have two problems with this solution:

You can have only one SqlDataReader for each connection and inside the CLR method. I have only one connection - as far as I understand.
I do not know how to tell SqlDataReader how to read data in pieces. I could not find the documentation of how SqlDataReader should behave. As far as I understand, it prepares the entire set of results and loads the entire result into memory. Even if I consumed only a small fraction.

Any tips on how to solve this as a CLR method? Or is there a lower level interface for the SQL server that is more suitable for my problem?

Update: I had to make two points more explicit:

I'm talking about large datasets, so a query can result in 1 mio record, but my algorithm will only consume the first 100-200. But, as I said, I do not know the exact number in advance.
I know that SQL might not be the best choice for such an algorithm. But due to other restrictions, it must be an SQL server. Therefore, I am looking for the best solution.

+7

c # sql-server-2008 sqlclr

Achim Aug 22 '11 at 20:25

source share

3 answers

SQL is designed to work with huge datasets and is extremely efficient. Using established logic, it is often not necessary to iterate over data to perform operations, and there are a number of built-in ways to do this in SQL itself.

1) a record based on a data set for updating data without cursors

2) use deterministic user functions with set-based logic (you can do this using the SqlFunction attribute in the CLR code ). The non-deterministic effect will affect the inclusion of the query inside the cursor, which means that the output of the value is not always the same, given the same input.

 [SqlFunction(IsDeterministic = true, IsPrecise = true)] public static int algorithm(int value1, int value2) { int value3 = ... ; return value3; }

3) use cursors as a last resort. This is a powerful way to execute logic in a database row, but has a performance impact. It looks like this CLR article may not execute SQL cursors (thanks to Martin).

I noticed that the complexity of using set-based logic is too great. Can you give an example? There are many SQL methods for solving complex problems - CTE, Views, partitioning, etc.

Of course, you may well be right in your approach, and I don’t know what you are trying to do, but my gut says that they use SQL tools. Multiple reader malfunctioning is the wrong approach to database implementation. You may need to have multiple threads invoke the SP to start parallel processing, but do not do this inside the CLR.

To answer your question, with CLR implementations (and IDataReader ), you really don't need to output pages to pieces, because you are not loading data into memory or transferring data over the network. IDataReader gives you access to the data stream row by row. From the sounds, this is your algorithm that determines the number of records that need updating, so when this happens, just stop calling Read() and end at that point.

 SqlMetaData[] columns = new SqlMetaData[3]; columns[0] = new SqlMetaData("Value1", SqlDbType.Int); columns[1] = new SqlMetaData("Value2", SqlDbType.Int); columns[2] = new SqlMetaData("Value3", SqlDbType.Int); SqlDataRecord record = new SqlDataRecord(columns); SqlContext.Pipe.SendResultsStart(record); SqlDataReader reader = comm.ExecuteReader(); bool flag = true; while (reader.Read() && flag) { int value1 = Convert.ToInt32(reader[0]); int value2 = Convert.ToInt32(reader[1]); // some algorithm int newValue = ...; reader.SetInt32(3, newValue); SqlContext.Pipe.SendResultsRow(record); // keep going? flag = newValue < 100; }

+1

TheCodeKing Sep 04 '11 at 9:05

source share

Cursors are just an SQL function. If you want to read pieces of data at a time, you need some kind of paging to return only a certain number of records. If you are using Linq,

 .Skip(Skip) .Take(PageSize)

You can use gaps and values to limit the returned results.

You can simply iterate over the DataReader by doing something like this:

 using (IDataReader reader = Command.ExecuteReader()) { while (reader.Read()) { //Do something with this record } }

This will be repeated one at a time, similar to a cursor in SQL Server.

For multiple recordsets at once, try MARS (if SQL Server)

http://msdn.microsoft.com/en-us/library/ms131686.aspx

0

Jon raynor Aug 22 '11 at 20:48

source share

Remus Rusanu · Accepted Answer · 2011-09-07T00:35:08+0000

SqlDataReader does not read the entire dataset, you confuse it with Dataset . It reads line by line when the .Read() method is .Read() . If the client does not consume the result set, the server pauses the request because it does not have space to write the output to (selected lines). Execution resumes as the client consumes more rows (SqlDataReader.Read is called). There is even a special flag for the behavior of the SequentialAccess command, which instructs ADO.Net not to preload the entire row useful for accessing large BLOB columns in streaming mode (see Loading and loading images from SQL Server via ASP.Net MVC for a practical example).

You can activate several active result sets (SqlDataReader) in one connection if MARS is active. However, MARS is not compatible with SQLCLR contextual connections.

So, you can create a CLR streaming TVF to do something that you need in the CLR, but only if you have one source of SQL queries, several queries will require you to abandon the context connection and use isntead full connection, those. connect to the same instance in loopback and this will enable MARS and therefore consume multiple result sets. But loopback has its own problems, as it violates the boundaries of the transactions that you have from the contextual connection. In particular, with a loopback connection, your TVF will not be able to read changes made by the same transaction called TVF, because it is a different transaction in a different connection.

"cursor as" reading inside a CLR procedure / function

More articles: