If your own code checks one character at a time, you want to use a sentinel to mark the end of the buffer or the end of the file so that you only have one test in your inner loop . In your case, that one test will be for the end of the line, so you want to temporarily bind a new line at the end of each buffer, for example.
The Wikipedia article on guards is not needed at all; he does not describe this case. You can find a description in any of Robert Sedgwick's algorithm textbooks.
You can also watch re2c , which can generate very fast code for scanning text data. It generates C code, but you can adapt it, and you can, of course, learn the technique by reading your article on re2c .
Norman ramsey
source share