Calculating the cost of combining nested loop blocks

Question

Calculating the cost of combining nested loop blocks

I am trying to calculate the cost of the (most efficient) block nested union of loops in terms of NDPR (number of reads on disk). Suppose you have a form request:

SELECT COUNT(*) FROM county JOIN mcd ON count.state_code = mcd.state_code AND county.fips_code = mcd.fips_code WHERE county.state_code = @NO

where @NO is replaced with a status code each time the request is executed.

(where a smaller table is used as an external one in order to create fewer pages. Ergo: R = graph, S = mcd).

I also know that page size = 2048 bytes

 Pointer = 8 byte Num. rows in mcd table = 35298 Num. rows in county table = 3141 Free memory buffer pages B = 100 Pages(X) = (rowsize)(numrows) / pagesize

What I'm trying to understand is how " WHERE county.state_code = @NO " affects my value?

Thank you for your time.

+7

sql database mysql

Jb2 Nov 22 '12 at 21:58

source share

1 answer

Radu · Answer 1 · 2015-10-20T12:38:12+0000

First, a few notes regarding the formula you wrote:

I'm not sure why you write "B - 2" instead of "B - 1". From a theoretical point of view, you need one buffer page to read in relation to S (you can do this by reading one page at a time).
Make sure you use all brackets. I would write the formula:
NPDR(R x S) = |Pages(R)| + |Pages(R)| / (B-2) * |Pages(S)|
All numbers in the formula should be rounded (but this is nitpicking).
Explanation of the general formula BNLJ:
- You read as many tuples from a smaller ratio (R) as you can store in memory (B-1 or B-2 pages with tuples).
- For each group of B-2 pages with tuples, you need to read the entire S-relation (| Pages (S) |) in order to combine for this particular range of R.
- At the end of the connection, the relation R is read exactly once, and the relation S is read as many times as we fill the memory buffer, namely |Pages(R)| / (B-2) |Pages(R)| / (B-2) times.

Now the answer:

In your example, the selection criteria apply to the ratio R (in this case, the Country table). This is part of the WHERE county.state_code = @NO . Therefore, the general formula is not applied directly.
When reading from the relation R (i.e., the Country table in your example), we can discard all unqualified tuples that do not meet the selection criteria. Assuming that there are 50 states in the US and that all states have the same number of counties, only 2% of the tuples in the Country table are qualified on average and need to be stored in memory. This reduces the number of iterations of the inner loop of the join (i.e., the number of times we need to check the S / table mcs relationship). The number 2%, obviously, is only the expected average and will vary depending on the actual state.
The formula for your problem will therefore be:
NPDR(R x S) = |Pages(County)| + |Pages(County)| / (B - 2) * |Counties in state @NO| / |Rows in table County| * |Pages(Mcd)|

Calculating the cost of combining nested loop blocks

More articles: