Difficulties with `agrep (..., fixed = F)`

?agrep ( grep with fuzzy matching) mentions that I can set the fixed=FALSE argument so that my pattern is interpreted as a regular expression.

However, I can’t get it to work!

 agrep('(asdf|fdsa)', 'asdf', fixed=F) # integer(0) 

The above should coincide with the fact that in this case the regular expression "(asdf | fdsa)" exactly matches the test string "asdf".

To confirm:

 grep('(asdf|fdsa)', 'asdf', fixed=F) # 1 : it does match with grep 

And even more vaguely, adist correctly sets the distance between the pattern and the string to 0, which means that agrep must necessarily return 1, not integer(0) (it is not possible for the value 0 to be greater than the default max.dist = 0.1 >).

 adist('(asdf|fdsa)', 'asdf', fixed=F) # [,1] # [1,] 0 

Why is this not working? I don `t understand? Workaround I am happy to use adist , but not quite sure how to convert the agrep default max.distance=0.1 parameter to adist corresponding parameter.

(yes, I am stuck on an old computer that cannot do better than R 2.15.2)

 > sessionInfo() R version 2.15.2 (2012-10-26) Platform: i686-redhat-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_AU.utf8 LC_NUMERIC=C [3] LC_TIME=en_AU.utf8 LC_COLLATE=en_AU.utf8 [5] LC_MONETARY=en_AU.utf8 LC_MESSAGES=en_AU.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base 
+8
r
source share
1 answer

tl; dr: agrep(..., fixed=F) doesn't seem to work with '|' the character. Use aregexec .

In further research, I think this is a mistake, and that agrep(..., fixed=F) does not work with '|' regexes (although adist(..., fixed=F) does).

To clarify, please note that

 adist('(asdf|fdsa)', 'asdf', fixed=T) # 7 nchar('(asdf|fdsa)') # 11 

If 'asdf' were agrep 'd for the line of irregular expression' (asdf | fdsa) ', then it would have a distance of 7.

In this note:

 agrep('(asdf|fdsa)', 'asdf', fixed=F, max.distance=7) # 1 agrep('(asdf|fdsa)', 'asdf', fixed=F, max.distance=6) # integer(0) 

These are the results that I expect if fixed=T If fixed=F , my regular expression will exactly match "asdf" and the distance will be 0, so I always get the result "1" from agrep .

So it looks agrep(pattern, x, fixed=F) doesn't work , i.e. actually considers fixed as TRUE for this type of template.

As @Arun mentions, it could just be '|' regular expressions that don't work. For example, agrep('la[sb]y', 'lazy', fixed=FALSE) works as expected.


EDIT: Workaround (thanks @Arun)

The aregexec function seems to work.

 > aregexec('(asdf|fdsa)', 'asdf', fixed=F) [[1]] [1] 1 1 attr(,"match.length") [1] 4 4 
+6
source share

All Articles