You can check it out yourself quite easily. A few caveats:
- Make sure that you have a large enough data set so that you do not miss the differences in the simple random activity of the processor. 100 + MB is usually a good target.
- Make sure that you run the test several times - the more, the better, in the absence of time between them. One test will always be insufficient and will always show the first data set faster, because it benefits from caching the record (basically the OS says that it was written when it wasn’t, but it just had the record queued in memory).
Here is an example of my test. This is a data set of 100 million rows with two 8-byte numbers, so 1.6 GB.
First, the results. I see a few seconds. What for? SAS performs several operations when replacing a dataset:
Write dataset to temporary file Delete the old dataset Rename temporary dataset to new dataset
On some operating systems, this seems faster than others; I found the Windows desktop computer to be pretty slow compared to Unix or even the Windows Server operating system, which is pretty fast. I assume Windows deletes more carefully than just changes the file system pointer, but I really don't know. This, of course, is not copying the entire file from the utility program directory (there is not enough time for this). I also suspect that write caching is still fueling new datasets a bit, especially as the time for all datasets grows as I write. The difference is probably only about a second or so, the difference between _REP iteration 2 and _NEW iteration 3 seems most reasonable to me.
Iteration 1 _NEW=7.26999998099927 _REP=12.9079999922978 Iteration 2 _NEW=10.0119998454974 _REP=11.0789999961998 Iteration 3 _NEW=10.1360001564025 _REP=15.3819999695042 Iteration 4 _NEW=14.7720000743938 _REP=17.4649999142056 Iteration 5 _NEW=16.2560000418961 _REP=19.2009999752044
Note that the first iteration of the new one is much faster than the others, and the total time increases as you go (as caching records is less and less able to keep up). I suspect that if you allow it to continue (or use an even larger file, for which I do not have time right now), you can see an even more consistent time. I'm also not sure what happens with write caching when a file that is written to caching is deleted; maybe he needs to wait until the write caching is written to disk before doing a delete operation or something like that. You can run a test where you waited 30 seconds between _NEW and _REP to check this.
The code:
%macro test_me(iter=1); %do _i=1 %to &iter.; %let start = %sysfunc(time()); data test&_i.; do x = 1 to 1e8; y=x**2; output; end; run; %let mid=%sysfunc(time()); data test&_i.; do x = 1 to 1e8; y=x**2; output; end; run; %let end=%sysfunc(time()); %let _new = %sysevalf(&mid.-&start.); %let _rep = %sysevalf(&end.-&mid.); %put Iteration &_i. &=_new. &=_rep.; %end; proc datasets nolist kill; quit; %mend test_me; options nosource nonotes nomprint nosymbolgen; %test_me(iter=5);
Joe
source share