Using escaped_list_separator with boost section

I play with the boost strings library and just stumbled upon the amazing simplicity of the split method.

string delimiters = ","; string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\""; // If we didn't care about delimiter characters within a quoted section we could us vector<string> tokens; boost::split(tokens, str, boost::is_any_of(delimiters)); // gives the wrong result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters", " inside a quote\""} 

Which would be nice and concise ... however it doesn't work with quotes, and instead I have to do something like the following

 string delimiters = ","; string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\""; vector<string> tokens; escaped_list_separator<char> separator("\\",delimiters, "\""); typedef tokenizer<escaped_list_separator<char> > Tokeniser; Tokeniser t(str, separator); for (Tokeniser::iterator it = t.begin(); it != t.end(); ++it) tokens.push_back(*it); // gives the correct result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters, inside a quote\""} 

My question can be split or is another standard algorithm used when you specify the delimiters? Thanks to purpledog, but I already have an outdated way to achieve the desired result, I just think that it is rather cumbersome, and if I can not replace it with a simpler and more elegant solution, I would not use it at all without first investing it in another method .

EDIT: Updated code to show results and clarify the question.

+4
source share
3 answers

There seems to be no easy way to do this using boost :: split method. The shortest piece of code I can find for this is

 vector<string> tokens; tokenizer<escaped_list_separator<char> > t(str, escaped_list_separator<char>("\\", ",", "\"")); BOOST_FOREACH(string s, escTokeniser) tokens.push_back(s); 

which is only a little more detailed than the original fragment

 vector<string> tokens; boost::split(tokens, str, boost::is_any_of(",")); 
+5
source

I don't know about boost :: string library, but using boost regex_token_iterator, you can express delimiters in terms of regular expression. So yes, you can use quoted separators and much more complex things.

Note that this was done using regex_split, which is now deprecated.

Here's an example taken from boost document:

 #include <iostream> #include <boost/regex.hpp> using namespace std; int main(int argc) { string s; do{ if(argc == 1) { cout << "Enter text to split (or \"quit\" to exit): "; getline(cin, s); if(s == "quit") break; } else s = "This is a string of tokens"; boost::regex re("\\s+"); boost::sregex_token_iterator i(s.begin(), s.end(), re, -1); boost::sregex_token_iterator j; unsigned count = 0; while(i != j) { cout << *i++ << endl; count++; } cout << "There were " << count << " tokens found." << endl; }while(argc == 1); return 0; } 

If the program starts with hello world as an argument, the output is:

 hello world There were 2 tokens found. 

Change boost :: regex re ("\ s +"); in boost :: regex re ("\", \ "); would separate the quoted delimiters. starting the program with hello", "world , since the argument would also lead to:

 hello world There were 2 tokens found. 

But I suspect that you want to deal with such things: "hello", "peace" , in which case one solution:

  • only with coma
  • then remove the "" (possibly using boost / algorithm / string / trim.hpp or the regex library).

EDIT: software output added

+2
source

This will produce the same result as Jamie Cook's answer without an explicit loop.

 tokenizer<escaped_list_separator<char> >tok(str); vector<string> tokens( tok.begin(), tok.end() ); 

The second marker constructor parameter defaults to escaped_list_separator<char>("\\", ",", "\"") , so it is not needed unless you have different requirements for commas or quotation marks.

+2
source

All Articles