Remove rows from row

I am writing a text parser that should be able to remove comments from strings. I use a fairly simple language in which all comments are triggered by the # character, and after that it would be easy to delete everything, but I have to deal with the possibility that # is inside the line.

So my question is asked as a string, for example, Value="String#1";"String#2"; # This is an array of "-delimited strings, "Like this"
What is the best way to get a substring
Value="String#1";"String#2";(pay attention to the finite space)

Note that the comment may contain quotation marks, and the entire line may choose between "and" delimitation, although it will be consistent across the line. This is known in advance if it is important. Quotes in lines will be escaped \

+4
source share
1 answer
std::string stripComment(std::string str) {
    bool escaped = false;
    bool inSingleQuote = false;
    bool inDoubleQuote = false;
    for(std::string::const_iterator it = str.begin(); it != str.end(); it++) {
         if(escaped) {
             escaped = false;
         } else if(*it == '\\' && (inSingleQuote || inDoubleQuote)) {
             escaped = true;
         } else if(inSingleQuote) {
             if(*it == '\'') {
                 inSingleQuote = false;
             }
         } else if(inDoubleQuote) {
             if(*it == '"') {
                 inDoubleQuote = false;
             }
         } else if(*it == '\'') {
             inSingleQuote = true;
         } else if(*it == '"') {
             inDoubleQuote = true;
         } else if(*it == '#') {
             return std::string(str.begin(), it);
         }
    }
    return str;
}

EDIT: or more training FSM,

std::string stripComment(std::string str) {
    int states[5][4] = {
    //      \  '  "
        {0, 0, 1, 2,}
        {1, 3, 0, 1,},  //single quoted string
        {2, 4, 2, 0,},  //double quoted string
        {1, 1, 1, 1,},  //escape in single quoted string
        {2, 2, 2, 2,},  //escape in double quoted string
    };
    int state = 0;
    for(std::string::const_iterator it = str.begin(); it != str.end(); it++) {
        switch(*it) {
            case '\\':
                state = states[state][1];
                break;
            case '\'':
                state = states[state][2];
                break;
            case '"':
                state = states[state][3];
                break;
            case '#':
                if(!state) {
                    return std::string(str.begin(), it);
                }
            default:
                state = states[state][0];
        }          
    }
    return str;
}

The array statesdefines the transition between FSM states.

The first index - the current state, 0, 1, 2, 3or 4.

The second index corresponding symbol \, ', "or other symbol.

The array reports the next state based on the current state and symbol.

FYI, they assume the backslash excludes any character in the string. You at least need to avoid backslashes, so you might have a line ending with a backslash.

+4
source

All Articles