Update history table with delete and insert

If the identifier from temp matches the id in hist, then remove this line from the line and paste it from temp, and if id does not match any lines in hist, add the line to hist. I have two datasets with the same columns:

data hist; input id1 id2 var1 $; cards; 1 10 a 2 20 b 3 30 c 4 40 d 5 50 e ; run; data temp; input id1 id2 var1 $; cards; 2 20 b 3 30 d 4 40 e 5 50 f 6 60 g ; run; 

temp will have current and history will have all history lines.

I want to delete and insert a row into the history dataset if it exists in temp (Update) .. and add a row to the history dataset if the row from temp does not exist in history . history data set will contain at least 100 records. From the above inputs, I want this to be the case.

 1 10 a 2 20 b 3 30 d 4 40 e 5 50 f 6 60 g 

Lines 1,2,3,4 from temp matched with lines in history , so they will be updated, and line 5 from temp does not match, so it will be added to history .

Sorry for the misunderstanding before. It should be clear now, I think. Thanks, Sam.

+4
source share
3 answers

There is a way to let SAS and PROC APPEND do this for you.

Therefore, without knowing the data columns, I will speak in general. I assume that you have one or more fields that define uniqueness.

First create a unique index in HISTORY

 proc sql; create unique index hist_unq on HISTORY(col1, col2, ...); quit; 

Then use PROC APPEND:

 proc append base=history data=temp force; run; 

You will see a warning in the log, and note that less than the total has been added. Sort of:

 NOTE: Appending WORK.TEMP to WORK.HISTORY. WARNING: Duplicate values not allowed on index hist_unq for file HISTORY, 36 observations rejected. NOTE: There were 70 observations read from the data set WORK.TEMP. NOTE: 34 observations added. NOTE: The data set WORK.HISTORY has 144 observations and 2 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 
+3
source

I think DomPazz still has a better answer, because of its simplicity, but if you are in a situation where you cannot conveniently define a unique index on history , or you really want to avoid any warning messages, then the following is more a complex approach to working with data. It should be almost as fast as the proc append , while avoiding the memory and processor requirements of the hash object set by Joe.

NB, although history does not require a unique index for this, it will add unwanted lines from temp if you have more lines for any matching id in temp than in history .

 data history; input id var1 $; cards; 1 a 2 b 3 c 4 d 5 e 5 f ; run; data temp; input id var1 $; cards; 3 d 4 e 5 f 6 g 6 h ; run; proc datasets lib = work nolist; modify history; index create id; run; quit; data history; set temp; modify history key = id; if _iorc_ ne 0 then do; _ERROR_ = 0; output; end; run; 

How it works:

  • Reading in a record from temp (instruction of the 1st set)
  • Trying to read in the first record from history with the corresponding id value.
  • If we did not find a match, print a new entry.
  • Since we never read the lines from history for any of the irrelevant id from temp , the values โ€‹โ€‹of all the other variables are still present in the PDV when we read them from temp in step 1.
  • The index for history not updated until the data step completes adding / changing / deleting rows, so for the last row temp , although we have already added one row with id = 6 to history , we do not find it through the index in subsequent iterations of one same data step, so both rows are added.

Edit: an alternative version that updates history entries with the corresponding identifiers:

 data history; set temp(rename = (var1 = new_var1)); do _n_ = 1 by 1 until(eof); modify history key = id end = eof; if _iorc_ = 0 then do; var1 = new_var1; replace; end; else do; _ERROR_ = 0; if not(eof and _n_ > 1) then output; end; end; run; 

One of the drawbacks is that you need to rename all variables without id to temp , because when the modify statement reads in the line from history , it overwrites the variables with the same name in PDV. If you have unique indexes for id for temp and history , you can avoid this as follows:

 data history; set temp(keep = id); modify history key = id; if _iorc_ = 0 then do; set temp key = id; replace; end; else do; _ERROR_ = 0; output; end; run; 

The additional dialing operator reads in the corresponding record from temp second time if the corresponding record was read with history , which overwritten it for the first time.

+1
source

One way to do what you are describing is union in SQL. union does not add duplicate entries by default. However, this takes some time (as it should identify these records).

 proc sql; create table history_new as select * from history union select * from temp; quit; 

If you have enough memory to load keys in history in a hash table in memory, this is most likely the fastest option. Load history into a hash, set temp, find() current line, if it is not found, add this line to the hash. Then, at the end, output the hash back to History.

Depending on the relative tempo and history sizes, you can also output only lines with additions to the data set, and not add them to the hash, and then proc append this data set.

If temp less than a quarter or so the size of history is probably the best option.

 data temp_to_Add; set temp; if _n_=1 then do; declare hash h(dataset:'history'); h.defineKey('keyvars'); h.defineDone(); end; rc = h.find(); if rc ne 0 then output; run; 

If you need to check temp for yourself, add node to the hash when rc ne 0 .

0
source

All Articles