I think DomPazz still has a better answer, because of its simplicity, but if you are in a situation where you cannot conveniently define a unique index on history , or you really want to avoid any warning messages, then the following is more a complex approach to working with data. It should be almost as fast as the proc append , while avoiding the memory and processor requirements of the hash object set by Joe.
NB, although history does not require a unique index for this, it will add unwanted lines from temp if you have more lines for any matching id in temp than in history .
data history; input id var1 $; cards; 1 a 2 b 3 c 4 d 5 e 5 f ; run; data temp; input id var1 $; cards; 3 d 4 e 5 f 6 g 6 h ; run; proc datasets lib = work nolist; modify history; index create id; run; quit; data history; set temp; modify history key = id; if _iorc_ ne 0 then do; _ERROR_ = 0; output; end; run;
How it works:
- Reading in a record from
temp (instruction of the 1st set) - Trying to read in the first record from
history with the corresponding id value. - If we did not find a match, print a new entry.
- Since we never read the lines from
history for any of the irrelevant id from temp , the values โโof all the other variables are still present in the PDV when we read them from temp in step 1. - The index for
history not updated until the data step completes adding / changing / deleting rows, so for the last row temp , although we have already added one row with id = 6 to history , we do not find it through the index in subsequent iterations of one same data step, so both rows are added.
Edit: an alternative version that updates history entries with the corresponding identifiers:
data history; set temp(rename = (var1 = new_var1)); do _n_ = 1 by 1 until(eof); modify history key = id end = eof; if _iorc_ = 0 then do; var1 = new_var1; replace; end; else do; _ERROR_ = 0; if not(eof and _n_ > 1) then output; end; end; run;
One of the drawbacks is that you need to rename all variables without id to temp , because when the modify statement reads in the line from history , it overwrites the variables with the same name in PDV. If you have unique indexes for id for temp and history , you can avoid this as follows:
data history; set temp(keep = id); modify history key = id; if _iorc_ = 0 then do; set temp key = id; replace; end; else do; _ERROR_ = 0; output; end; run;
The additional dialing operator reads in the corresponding record from temp second time if the corresponding record was read with history , which overwritten it for the first time.
source share