I was able to successfully create parallel connections in R on the TSQL server using the following code:
SQL_retrieve <- function(x){ con <- odbcDriverConnect( 'driver={SQL Server};server=OPTMSLMSOFT02;database=Ad_History;trusted_connection=true' ) odbcGetInfo(con) rawData <- sqlQuery(con, paste( "select * from AD_MDL_R_INPUT a where a.itm_lctn_num = ", facility[x] )) odbcClose(con) return(rawData) } cl <- makeCluster(5) registerDoParallel(cl) outputPar <- foreach(j = 1:facility_count, .packages="RODBC") %dopar% SQL_retrieve(j) stopCluster(cl)
I would expect all connections to be actively loaded in parallel, but the reality is that usually only one or two connections are active at a time (see image below). This is independent of the number of clusters / connections created. Even with 32 connections, the total download time is reduced a little over 1/2 (should it be closer to 1/32, theoretically, right?). There are also large pauses between connection activity. Why is this? What am I missing here?
Connection utility
Some notes to keep in mind:
- TSQL server and R are on the same server, so network latency is not a problem.
- The TSQL server allows connection up to ~ 32 thousand, so we do not encounter the problem of limiting the session.
UPDATE 7/26/17 Another hit on this issue, and now it works (code has not changed). You do not know what happened between the initial and initial publications, but there may be some changes in the settings of the MS SQL server (unlikely).
The stretch time of 7.9 million lines follows the curve in the image below.

sql foreach tsql r parallel-foreach
Evan larson
source share