Have you tried using the RFC 3986 proposal? If you can use GCC-4.9, you can go directly to <regex> .
It states that using ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? you can get as sub-matrices:
scheme = $2 authority = $4 path = $5 query = $7 fragment = $9
For example:
int main(int argc, char *argv[]) { std::string url (argv[1]); unsigned counter = 0; std::regex url_regex ( R"(^(([^:\/?#]+):)?(//([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?)", std::regex::extended ); std::smatch url_match_result; std::cout << "Checking: " << url << std::endl; if (std::regex_match(url, url_match_result, url_regex)) { for (const auto& res : url_match_result) { std::cout << counter++ << ": " << res << std::endl; } } else { std::cerr << "Malformed url." << std::endl; } return EXIT_SUCCESS; }
Then:
./url-matcher http://localhost.com/path\?hue\=br\#cool Checking: http://localhost.com/path?hue=br#cool 0: http://localhost.com/path?hue=br#cool 1: http: 2: http 3: //localhost.com 4: localhost.com 5: /path 6: ?hue=br 7: hue=br 8:
Ciro Costa Jul 24. '15 at 14:33 2015-07-24 14:33
source share