Is the BETWEEN feature expensive in SQL Server?

I am trying to combine two relatively simple tables together, but my query is experiencing severe freezes. I'm not sure why, but I think this may have something to do with the between function. My first table looks something like this (with lots of other columns, but this will be the only column I pull out):

RowNumber 1 2 3 4 5 6 7 8 

My second table "groups" my rows into "blocks" and has the following layout:

 BlockID RowNumberStart RowNumberStop 1 1 3 2 4 7 3 8 8 

The desired result I want is to associate a RowNumber with a BlockID, as shown below, with the same number of rows with the first table. Thus, the result will look like this:

 RowNumber BlockID 1 1 2 1 3 1 4 2 5 2 6 2 7 2 8 3 

To get this, I used the following query, writing the results to the temp table:

 select A.RowNumber, B.BlockID into TEMP_TABLE from TABLE_1 A left join TABLE_2 B on A.RowNumber between B.RowNumberStart and B.RowNumberStop 

TABLE_1 and TABLE_2 are actually very large tables. Table 1 is about 122M rows, and TABLE_2 is about 65M rows. In TABLE_1, RowNumber is defined as "bigint", and in TABLE_2 the values โ€‹โ€‹of BlockID, RowNumberStart and RowNumberStop are defined as "int". Not sure if this matters, but just need to include this information.

The request was hung for eight hours. Similar queries for this type and amount of data are not getting anywhere. Therefore, I am wondering if this could be the "between" expression that hangs this request.

Would definitely welcome any suggestions on how to make this more efficient.

+6
source share
2 answers

BETWEEN is simply abbreviated for:

 select A.RowNumber, B.BlockID into TEMP_TABLE from TABLE_1 A left join TABLE_2 B on A.RowNumber >= B.RowNumberStart AND A.RowNumber <= B.RowNumberStop 

If the execution plan goes from B to A (but the left connection indicates that it should go from A to B, really), then I assume that TABLE_1 is indexed in RowNumber (and this should be the coverage for this request). If it has only a clustered index in RowNumber and the table is very wide, I recommend a non-clustered index only for RowNumber, since you will put a lot more rows on the page this way.

Otherwise, you want to index TABLE_2 in a RowNumberStart DESC or RowNumberStop ASC, because for a given A you will need DESC on a RowNumberStart to match.

I think you might want to change your connection to INNER JOIN as your access criteria are configured. (Will you ever get TABLE_1 without a block?)

If you look at the execution plan, you should learn more about why performance might be poor, but the Stop criteria is probably not used in the search in TABLE_1.

Unfortunately, SQLMenace's answer about SELECT INTO been deleted. My comment on this should have been: @Martin SELECT INTO performance is not as bad as it used to be, but I still recommend CREATE TABLE for most products because SELECT INTO will output types and NULLability. This is great if you confirm that it does what you think it does, but creating a super long varchar or decimal column with very strange precision can lead not only to odd tables, but also to performance problems (especially with some of those big barbarians when you forgot LEFT or something else). I think this just helps to understand what you expect from the table. Often I select INTO using WHERE 0 = 1, and check the schema and then the script using my settings (e.g. adding an identifier or adding a default timestamp column).

+6
source

You have one main problem: you want to display too much data at the same time. Are you really sure you want to process the result of ALL 122M rows from table 1 right away? Do you really need this?

+1
source

All Articles