First of all, I know that this is a bad decision, and I should not do this.
Background: feel free to skip
However, I need a quick fix for a working system. We currently have a data structure that is serialized into a string by creating "xml" fragments using a number of string constructors. Whether this is valid XML, I rather doubt it. After creating this xml and before sending it in turn, some cleaning code looks for xml declarations in the entry line and deletes them.
The way this is done (iterating over each character that executes indexOf for <?xml Xml) is so slow that it causes thread timeouts and kills our systems. Ultimately, I will try to fix it correctly (create xml using xml documents or something similar), but for now I need a quick fix to replace what is there.
Please keep in mind, I know this is far from an ideal solution, but I need a quick solution to get us back to work.
Question
I thought of using regular expressions to find ads. I planned: <\?xml.*?> , And then used Regex.Replace(input, string.empty) to delete.
Could you tell me if there are any obvious problems with this regular expression, or just write it in code using string.IndexOf("<?xml") string.IndexOf("?>") string.IndexOf("<?xml") and string.IndexOf("?>") In (much more reasonable) loop is better.
EDIT I need to take care of new lines.
Will <\?xml[^>]*?> Achieve the goal?
EDIT2
Thanks for the help. Regex wise <\?xml.*?\?> Worked fine. In the end, I wrote some temporary code and tested both ar egex and IndexOf() . I found that for our simplest use case, just deleting the ad took:
- Almost a second how it was
- 0.01 seconds with regex
- impossible to use loop and
IndexOf()
So I went for IndexOf() as it is a very simple loop.