I work with some log files that are very poorly formatted, the column separator is an element that (often) appears inside the field and is not escaped. For instance:
sam,male,september,brown,blue,i like cats, and i like dogs
Where:
name,gender,month,hair,eyes,about
So, as you can see, about contains a column separator, which means that one parsing by the separator will not work, because it will split about me into two separate columns. Now imagine this using the chat system ... you can visualize the problems that I'm sure.
So, theoretically, what's the best approach to solving this? I'm not looking for a language-specific implementation, but a more general pointer to the right direction or some ideas on how others solved it ... without doing it manually.
Edit:
I must clarify, my actual magazines are in much worse condition. There are fields with separator characters all over the world, there is no pattern that I can find.
source share