Appendix B of RFC 2396 provides an overly regular expression for dividing a URI into its components, and we can adapt it for your case.
^(([^:/?#]+):)?(//([^/?
This leaves The_Token_I_Want at $6 , which is a hashderlined subexpression above. (Note that hashes are not part of the template.) See it live:
#! /usr/bin/perl $_ = "http://domain.com/133742/The_Token_I_Want.zip"; if (m!^(([^:/?
Conclusion:
$ ./prog.pl
The_Token_I_Want
UPDATE: I see in the comment that you are using boost::regex , so be sure to escape the backslash in your C ++ program.
#include <boost/foreach.hpp> #include <boost/regex.hpp> #include <iostream> #include <string> int main() { boost::regex token("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*" "/([^.]+)" // ####### I CAN HAZ HASHDERLINE PLZ "[^?#]*)(\\?([^#]*))?(#(.*))?"); const char * const urls[] = { "http://domain.com/133742/The_Token_I_Want.zip", "http://domain.com/12345/another_token.zip", "http://domain.com/0981723/YET_ANOTHER_TOKEN.zip", }; BOOST_FOREACH(const char *url, urls) { std::cout << url << ":\n"; std::string t; boost::cmatch m; if (boost::regex_match(url, m, token)) t = m[6]; else t = "<no match>"; std::cout << " - " << m[6] << '\n'; } return 0; }
Conclusion:
http://domain.com/133742/The_Token_I_Want.zip:
- The_Token_I_Want
http://domain.com/12345/another_token.zip:
- another_token
http://domain.com/0981723/YET_ANOTHER_TOKEN.zip:
- YET_ANOTHER_TOKEN
source share