Parsing strings in C ++

I have the output of another program, which is more designed to be read on a person than machine-readable, but I'm still going to parse it. This is not too complicated.

However, I'm wondering what is the best way to do this in C ++. It is rather a general practice question.

I looked into Boost.Spirit and even worked a little. This is madness! If I were developing a language that I read, this could be the right tool for the job. But since this is so, given its extreme compilation times, a few error pages from g ++, when I do something wrong, it's just not what I need. (I also lack performance at runtime.)

Thinking about using the C ++ operator <<, but that seems futile. If my file has lines such as “John has 5 widgets” and others “Mary works on Ramsay Street 459”, how can I make sure that I have a line of the first type in my program and not the second type? I have to read the whole string and then use things like string::find and string::substr . I suppose.

And that leaves sscanf . He would do well with the above cases.

 if( sscanf( str, "%s has %d widgets", chararr, & intvar ) == 2 ) // then I know I matched "foo has bar" type of string, // and I now have the parameters too 

So I'm just wondering if I'm missing something or if C ++ really doesn't have a built-in alternative.

+6
c ++ string parsing
source share
7 answers

sscanf really sounds very good for your requirements:

  • you can do redundant parsing, but you have no performance requirements prohibiting it
  • it localizes the requirements for various input words and allows you to analyze non-line values ​​directly into typed variables, which makes it easier to understand the various input formats.

The potential problem is that it is error prone, and if you have a lot of changing parsing phrases, then testing efforts and risk can be troubling. Preserving the sscanf spirit, but using istream for type safety:

 #include <iostream> #include <sstream> // Str captures a string literal and consumes the same from an istream... // (for non-literals, better to have `std::string` member to guarantee lifetime) class Str { public: Str(const char* p) : p_(p) { } const char* c_str() const { return p_; } private: const char* p_; }; bool operator!=(const Str& lhs, const Str& rhs) { return strcmp(lhs.c_str(), rhs.c_str()) != 0; } std::istream& operator>>(std::istream& is, const Str& str) { std::string s; if (is >> s) if (s.c_str() != str) is.setstate(std::ios_base::failbit); return is; } // sample usage... int main() { std::stringstream is("Mary has 4 cats"); int num_dogs, num_cats; if (is >> Str("Mary") >> Str("has") >> num_dogs >> Str("dogs")) { std::cout << num_dogs << " dogs\n"; } else if (is.clear(), is.seekg(0), // "reset" the stream... (is >> Str("Mary") >> Str("has") >> num_cats >> Str("cats"))) { std::cout << num_cats << " cats\n"; } } 
+3
source share

The GNU flex and bison tools are very powerful tools that you can use, which are spiritually, but (according to some people) easier to use, partly because error reporting is slightly better because the tools have their own compilers. This, or Spirit, or some other parser generator, is the “right” way to go with this, because it gives you maximum flexibility in your approach.

If you are thinking about using strtok , you may need to look at the stringstream , which breaks into whitespace and allows you to make nice formatting conversions between strings, primitives, etc. It can also be hooked up to STL algorithms and avoids all the messy details of raw C-style string memory management.

+2
source share

I wrote a great syntax code in C ++. This works great for this, but I wrote the code myself and did not rely on more general code written by someone else. C ++ does not contain the extended code already written, but it is a great language for writing such code.

I'm not sure if your question is beyond what you would like to find code that someone has already written that will do what you need. Part of the problem is that you have not actually described what you need, or asked a question about this.

If you can ask the question more specifically, I would be happy to try and offer a more specific answer.

+1
source share

I used Boost.Regex (which I think is also tr1 :: regex). Easy to use.

+1
source share

strtok () always exists, I suppose

0
source share

Take a look at strtok .

0
source share

Depending on what you want to parse, you may need a regular expression library. See msdn or earlier .

Personally, again depending on the exact format, I would think about using perl for the initial conversion to a more machine-readable format (for example, a CSV record), and then importing into C ++ is much easier.

If you stick with C ++, you need to:

  • Define a record - hopefully just a line
  • Define Record Type - Use Regular Expression
  • Parsing a record - scanf is fine

Base class on lines:

 class Handler { public: Handler(const std::string& regexExpr) : regex_(regexExpr) {} bool match(const std::string& s) { return std::tr1::regex_match(s,regex_); } virtual bool process(const std::string& s) = 0; private: std::tr1::basic_regex<char> regex_; }; 

Define a derived class for each record type, insert an instance of each of the set, and find the matches.

 class WidgetOwner : public Handler { public: WidgetOwner() : Handler(".* has .* widgets") {} virtual bool process(const std::string& s) { char name[32]; int widgets= 0; int fieldsRead = sscanf( s.c_str(), "%32s has %d widgets", name, & widgets) ; if (fieldsRead == 2) { std::cout << "Found widgets in " << s << std::endl; } return fieldsRead == 2; } }; struct Pred { Pred(const std::string& record) : record_(record) {} bool operator()(Handler* handler) { return handler->match(record_); } std::string record_; }; std::set<Handler*> handlers_; handlers_.insert(new WidgetOwner); handlers_.insert(new WorkLocation); Pred pred(line); std::set<Handler*>::iterator handlerIt = std::find_if(handlers_.begin(), handlers_.end(), pred); if (handlerIt != handlers_.end()) (*handlerIt)->process(line); 
0
source share

All Articles