I am trying to deal with some delimited text files that have non-standard delimiters (not separated by comma / quote or tab). Delimiters are random ASCII characters that often do not appear between delimiters. After searching, I seem to have only found that no .NET solutions would fit my needs, and the user libraries that people wrote for this seem to have some drawbacks when it comes to gigantic input (4 GB file with some field values, very easily several million characters).
While this seems a bit extreme, it's actually the industry standard for electronic document management (EDD) for some review software to have field values that contain the full contents of the document. For reference, I previously did this in python using the csv module without any problems.
Here is an example input:
Field delimiter =
quote character = þ
þFieldName1þþFieldName2þþFieldName3þþFieldName4þ
þValue1þþValue2þþValue3þþSomeVery,Very,Very,Large value(5MB or so)þ
...etc...
Edit: So I went ahead and created a delimited file parser from scratch. I am a little tired using this solution, as it may be error prone. It also does not feel "elegant" or right when it is necessary to write its own parser for such a task. I also have the feeling that I probably shouldn't have written a parser at all from scratch.
source
share