What does std :: match_results :: return size mean?

I am a bit confused in the following C ++ 11 code:

#include <iostream> #include <string> #include <regex> int main() { std::string haystack("abcdefabcghiabc"); std::regex needle("abc"); std::smatch matches; std::regex_search(haystack, matches, needle); std::cout << matches.size() << std::endl; } 

I expect it to print 3 , but instead I get 1 . Did I miss something?

+8
c ++ regex c ++ 11
source share
3 answers

You get 1 because regex_search returns only 1 match, and size() returns the number of capture groups + an integer matching value.

Your matches : ...:

An object of type match_results (for example, cmatch or smatch) that is populated with this function, with information about the results of the match and any submatrices found.

If [regular expression search] is successful, it is not empty and contains a number of sub_match objects: the first sub_match element matches the entire match, and if the regular expression contains subexpressions that must be matched (i.e., groups with delimiters in brackets), their respective subheadings are stored as consecutive sub_match elements in the match_results object.

Here is the code that will find some matches:

 #include <string> #include <iostream> #include <regex> using namespace std; int main() { string str("abcdefabcghiabc"); int i = 0; regex rgx1("abc"); smatch smtch; while (regex_search(str, smtch, rgx1)) { std::cout << i << ": " << smtch[0] << std::endl; i += 1; str = smtch.suffix().str(); } return 0; } 

See the IDEONE demo returning abc 3 times.

Since this method destroys the input string, here is another alternative based on std::sregex_iterator ( std::wsregex_iterator should be used when your object is a std::wstring object):

 int main() { std::regex r("ab(c)"); std::string s = "abcdefabcghiabc"; for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r); i != std::sregex_iterator(); ++i) { std::smatch m = *i; std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n'; std::cout << " Capture: " << m[1].str() << " at Position " << m.position(1) << '\n'; } return 0; } 

Watch the IDEONE demo returning

 Match value: abc at Position 0 Capture: c at Position 2 Match value: abc at Position 6 Capture: c at Position 8 Match value: abc at Position 12 Capture: c at Position 14 
+9
source share

What you are missing is that matches populated with one entry for each capture group (including the entire matched substring, like the 0th capture).

If you write

 std::regex needle("a(b)c"); 

then you will get matches.size()==2 , with matches[0]=="abc" and matches[1]=="b" .

+3
source share
Decision

@stribizhev has the worst case quadratic complexity for regular regular expressions. For the insane (like "y *") it doesn't end there. In some applications, these problems may be DoS attacks awaiting their appearance. Here's the fixed version:

 string str("abcdefabcghiabc"); int i = 0; regex rgx1("abc"); smatch smtch; auto beg = str.cbegin(); while (regex_search(beg, str.cend(), smtch, rgx1)) { std::cout << i << ": " << smtch[0] << std::endl; i += 1; if ( smtch.length(0) > 0 ) std::advance(beg, smtch.length(0)); else if ( beg != str.cend() ) ++beg; else break; } 

According to my personal preference, this will find n + 1 matches of the empty regular expression in a string of length n. You can also just exit the loop after an empty match.

If you want to compare performance for a string with millions of matches, add the following lines after defining str (and don't forget to turn on optimization), once for each version:

 for (int j = 0; j < 20; ++j) str = str + str; 
0
source share

All Articles