Which is faster, where is the operator or where is the data set parameter

The question is really direct, which one is faster?

Given that we use a data step with two data sets in the set statement, and the data sets have the same variables.

From what Ive heard and read, if we multiply them using the same condition, say date = "10jan2014" d, the result will be exactly the same as in the above statement, or in the next data set in two data sets (where = (date = "10jan2014" d)). Because where it is executed before anything enters the PDV.

Is it correct?

For a better understanding of the issue, I created the following code:

Suppose we have these two data sets.

data people1;
format birth date9.;
input name $ birth :date9.;
datalines;
John 18jan1980
Mary 20feb1980
;
run;

data people2;
format birth date9.;
input name $ birth :date9.;
datalines;
Peter 18mar1980
Judas 18jan1980
;
run;

, 18jan1980. where where data set.

( ):

data everybody1;
set people1 (where=(birth="18jan1980"d))
    people2 (where=(birth="18jan1980"d));
run;

where:

data everybody2;
set people1
    people2;
where birth="18jan1980"d;
run;

. ?

.

+4
3

where where , : , where, , ( where).

where , Where Expression Processing. , where ; , .

+3

,

/

Keep/Drop 

set, set.

And the result is that if not with the set instruction, the CPU and I / O Usage time is less than in the case of the set instruction, in particular with 1/2 less value in the use of input / output.

And with keep / drop instructions appearing both at the installation stage and at the data step, the CPU time is further reduced if the I / O level is higher than the observed one.

This may be useful, given that keep / drop statements are also compile-time parameters.

0
source

All Articles