Not all TSQL joins are used in parallel

I was able to successfully create parallel connections in R on the TSQL server using the following code:

SQL_retrieve <- function(x){ con <- odbcDriverConnect( 'driver={SQL Server};server=OPTMSLMSOFT02;database=Ad_History;trusted_connection=true' ) odbcGetInfo(con) rawData <- sqlQuery(con, paste( "select * from AD_MDL_R_INPUT a where a.itm_lctn_num = ", facility[x] )) odbcClose(con) return(rawData) } cl <- makeCluster(5) registerDoParallel(cl) outputPar <- foreach(j = 1:facility_count, .packages="RODBC") %dopar% SQL_retrieve(j) stopCluster(cl) 

I would expect all connections to be actively loaded in parallel, but the reality is that usually only one or two connections are active at a time (see image below). This is independent of the number of clusters / connections created. Even with 32 connections, the total download time is reduced a little over 1/2 (should it be closer to 1/32, theoretically, right?). There are also large pauses between connection activity. Why is this? What am I missing here?

Connection utility

Some notes to keep in mind:

  • TSQL server and R are on the same server, so network latency is not a problem.
  • The TSQL server allows connection up to ~ 32 thousand, so we do not encounter the problem of limiting the session.

UPDATE 7/26/17 Another hit on this issue, and now it works (code has not changed). You do not know what happened between the initial and initial publications, but there may be some changes in the settings of the MS SQL server (unlikely).

The stretch time of 7.9 million lines follows the curve in the image below.

Time compared to SQL joins

+7
sql foreach tsql r parallel-foreach
source share
1 answer

SQL Server uses the Connection Pool. It usually takes a long time to connect from scratch. In addition, in practice, applications will perform duplicate identical connections, so combining improves performance. In a way, SQL overrides the connections, so the next connection will start at the foot and be much faster.

There are times when you do not want to use a union, and this is probably one of them. Shutting down the pool by adding "pooling = false"; as mentioned above @ rene-lykke-dahl should solve the problem.

Here you can learn more about the connection pool.

https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling

0
source share

All Articles