Comparing the merge command between R and Stata

As a user of R, I learn Stata using this resource and puzzled by the merge command.

In R, I don't need to worry about data merging because it merges everything anyway. I need not worry if the shared columns contain any duplicates, because the Y framework will merge with each duplicated row in the X dataframe. (using all=FALSE in merge )

But for Stata, I need to remove duplicate lines from X before moving on to concatenation.

Does Stata assume that in order to continue the merge common column in the main table must be unique?

+8
merge r stata
source share
2 answers

The answer to your question: None. I will try to explain why.

The link you mention covers only one type of merge that is possible with Stata, namely: one-to-many combining.

merge 1:m varlist using filename

Other types of merging are possible:

Individual merge on the specified key variables

merge 1:1 varlist using filename

Multivalued merge on specified key variables

merge m:1 varlist using filename

Many-to-many combination of specified key variables

merge m:m varlist using filename

Individual merger by observation

merge 1:1 _n using filename

Details, explanations and examples can be found in help merge .

If you do not know if the observations are unique in the data set, you can perform the following check:

bysort idvar: gen N = _N

ta N

If you find N values โ€‹โ€‹that are greater than 1, you know that observations are not unique to idvar.

This is actually the new syntax for the merge command that was introduced with Stata 11. Before Stata 11, the merge command was a bit simpler. You just needed to sort the data, and then you could:

merge varlist using filename

By the way, you can still use this old syntax in Stata 11 or higher.

+6
source share

joinby, unmatched (both) is a team that matches the merge of team R.

In particular, the merge m: m SHOULD NOT do many, many merges (i.e., a full join), contrary to what the documentation implies.

0
source share

All Articles