Does anyone have an efficient R3 function that mimics the behavior of find / any in R2?

Question

Does anyone have an efficient R3 function that mimics the behavior of find / any in R2?

Rebol2 has a / ANY refinement to the FIND function, which can perform pattern matching:

>> find/any "here is a string" "s?r" == "string"

I use this in hard loops that should work well. But the refinement has been removed in Rebol3.

What is the most efficient way to do this in Rebol3? (I guess some kind of parse solution.)

+8

parsing rebol rebol3

Ashley Jul 24 '15 at 13:43

source share

3 answers

Ashley · Answer 1 · 2015-07-25T01:23:10+0000

Here is the hit when handling the "*" case:

 like: funct [ series [series!] search [series!] ][ rule: copy [] remove-each sb: parse/all search "*" [empty? s] foreach sb [ append rule reduce ['to s] ] append rule [to end] all [ parse series rule find series first b ] ]

used as follows:

 >> like "abcde" "b*d" == "bcde"

Hostilefork · Answer 2 · 2015-07-25T15:10:47+0000

I edited your question for "clarity" and modified it to say "has been deleted." It sounded like it was a deliberate decision. However, in reality this may simply not be realized.

BUT, if someone asks me, I don’t think it should be in the box ... and not only because it is a lousy use of the word "EVERYTHING". That's why:

You are looking for patterns in strings ... so if you are limited to using strings to specify this pattern, you get into a “meta” problem. Say I want to extract the word *Rebol* or ?Red? , now you need to slip away and everything becomes ugly again. Return to RegEx .: - /

So you really can want no STRING! a pattern like s?r , but a BLOCK! pattern like ["s" ? "r"] ["s" ? "r"] . This would allow the construction of structures such as ["?" ? "?"] ["?" ? "?"] ["?" ? "?"] or [{?} ? {?}] [{?} ? {?}] . This is better than rephrasing string hacking that any other language uses.

And this is what PARSE does, albeit in a slightly less declarative way. He also uses words instead of characters, as Rebol likes. [{?} skip {?}] is a match rule in which skip is an instruction that moves the parsing position beyond any element of the analysis cycle between question marks. He could also do this if he would parse the block as an input and match [{?} 12-Dec-2012 {?}] .

I don’t know completely what / ALL behavior should or should be with something like "ab ?? cd e? * F" ... if it would provide alternative template logic or something else. I assume Rebol2 implementation is short? It probably matches only one pattern.

To establish the baseline, here's a possibly lame PARSE solution for the intent s?r :

 >> parse "here is a string" [ some [ ; match rule repeatedly to "s" ; advance to *before* "s" pos: ; save position as potential match skip ; now skip the "s" [ ; [sub-rule] skip ; ignore any single character (the "?") "r" ; match the "r", and if we do... return pos ; return the position we saved | ; | (otherwise) none ; no-op, keep trying to match ] ] fail ; have PARSE return NONE ] == "string"

If you want it to be s*r , you would change skip "r" return pos to "r" return pos .

In a performance note, I mentioned that it is true that characters match characters faster than strings. Thus, to #"s" and #"r" to end make a measurable difference in speed when analyzing strings in general. Other than that, I'm sure others can do better.

The rule, of course, is longer than "s?r" . But it is not so long when comments come out:

 [some [to #"s" pos: skip [skip #"r" return pos | none]] fail]

(Note: it has pos: as it is written. Is there a USE in PARSE, implemented or planned?)

However, the nice thing is that it offers hook points at all times when making a decision, and without eliminating defects has a naive string solution. (I am tempted to give my usual Bad LEGO alligator versus a good LEGO alligator .

But if you do not want to directly enter the code in PARSE, it seems that the real answer will be some kind of "Glob Expression" - to-PARSE. This may be the best interpretation of glob rebol, because you can do a one-time use:

  >> parse "here is a string" glob "s?r" == "string"

Or, if you intend to play a match often, cache the compiled expression. Also, suppose our block form uses words for literacy:

  s?r-rule: glob ["s" one "r"] pos-1: parse "here is a string" s?r-rule pos-2: parse "reuse compiled RegEx string" s?r-rule

It might be interesting to see such a compiler for regex . They can also accept not only line input, but also block input, so both "sr" and ["s" . "r"] ["s" . "r"] are legal ... and if you used a block form, you would not need to escape and write ["." . "."] ["." . "."] ["." . "."] to match ".A."

Pretty interesting things would be possible. Given that in RegEx:

 (abc|def)=\g{1} matches abc=abc or def=def but not abc=def or def=abc

Rebol can be modified to take either a lowercase form, or compiled into a PARSE rule with a form, for example:

 regex [("abc" | "def") "=" (1)]

Then you get a dialect change that does not require screening. Designing and writing such compilers remains as an exercise for the reader. :-)

rgchris · Answer 3 · 2015-07-28T03:06:56+0000

I broke this down into two functions: one that creates a rule to match a given search value, and the other to do a search. Separating the two allows you to reuse the same generated parsing block, where one search value is applied to several iterations:

 expand-wildcards: use [literal][ literal: complement charset "*?" func [ {Creates a PARSE rule matching VALUE expanding * (any characters) and ? (any one character)} value [any-string!] "Value to expand" /local part ][ collect [ parse value [ ; empty search string FAIL end (keep [return (none)]) | ; only wildcard return HEAD some #"*" end (keep [to end]) | ; everything else... some [ ; single char matches #"?" (keep 'skip) | ; textual match copy part some literal (keep part) | ; indicates the use of THRU for the next string some #"*" ; but first we're going to match single chars any [#"?" (keep 'skip)] ; it optional in case there a "*?*" sequence ; in which case, we're going to ignore the first "*" opt [ copy part some literal ( keep 'thru keep part ) ] ] ] ] ] ] like: func [ {Finds a value in a series and returns the series at the start of it.} series [any-string!] "Series to search" value [any-string! block!] "Value to find" /local skips result ][ ; shortens the search a little where the search starts with a regular char skips: switch/default first value [ #[none] #"*" #"?" ['skip] ][ reduce ['skip 'to first value] ] any [ block? value value: expand-wildcards value ] parse series [ some [ ; we have our match result: value ; and return it return (result) | ; step through the string until we get a match skips ] ; at the end of the string, no matches fail ] ]

Separating a function also gives you the basis for optimizing two different problems: finding the beginning and matching the value.

I went with PARSE, though *? apparently simple rules, there is nothing as expressive and quick as PARSE to effectively implement such a search.

According to @HostileFork, a question may arise about a dialect instead of wildcards - indeed, until Regex is replaced by a compilation dialect for parsing, but perhaps beyond the scope of the question.

Does anyone have an efficient R3 function that mimics the behavior of find / any in R2?

More articles: