How to remove specification from byte array

I have xml data in byte[] byteArray , which may or may not contain a specification. Is there a standard way in C # to remove a spec from it? If not, what is the best way to handle all cases, including all types of coding, in order to do the same?

In fact, I am fixing a bug in the code, and I do not want to change most of the code. Therefore, it would be better if someone could give me code to remove the specification.

I know what I can do, how to find out 60 , which is an ASCII value '<' and ignore bytes before that, but I don't want to do this.

+4
source share
3 answers

All C # XML parsers will automatically process the specification for you. I would recommend using XDocument - in my opinion, it provides the purest abstraction of XML data.

Using XDocument as an example:

 using (var stream = new memoryStream(bytes)) { var document = XDocument.Load(stream); ... } 

Once you have an XDocument, you can use it to omit bytes without specification:

 using (var stream = new MemoryStream()) using (var writer = XmlWriter.Create(stream)) { writer.Settings.Encoding = new UTF8Encoding(false); document.WriteTo(writer); var bytesWithoutBOM = stream.ToArray(); } 
+5
source

You will need to determine the byte order marks at the beginning of the byte array. There are several different combinations, as described at http://www.unicode.org/faq/utf_bom.html#bom1 .

Just create a small state machine that starts at the beginning of the byte array and looks for these sequences.

I don’t know how your array is used or what other parameters you use with it, so I can’t say how you “deleted” the sequence. Your options are as follows:

  • If you have start and count parameters, you can simply change them to reflect the starting point of the array (outside the specification).
  • If you have only the count parameter (except for the Length array property), you can move the data in the array to overwrite the specification, and adjust count accordingly.
  • If you do not have start or count parameters, you need to create a new array, the size of the old array minus the specification, and copy the data to a new array.

To “delete” a sequence, you probably want to identify the label, if there is one, and then copy the remaining bytes into a new byte array. Or, if you support the number of characters (except for the Length array property)

0
source

You can do something similar to skip specification bytes when reading from a stream. You will need to extend Bom.cs to include additional encodings, however afaik UTF is the only encoding using the specification ... may (most likely) be wrong with that.

I got information about coding types from here

 using (var stream = File.OpenRead("path_to_file")) { stream.Position = Bom.GetCursor(stream); } public static class Bom { public static int GetCursor(Stream stream) { // UTF-32, big-endian if (IsMatch(stream, new byte[] {0x00, 0x00, 0xFE, 0xFF})) return 4; // UTF-32, little-endian if (IsMatch(stream, new byte[] { 0xFF, 0xFE, 0x00, 0x00 })) return 4; // UTF-16, big-endian if (IsMatch(stream, new byte[] { 0xFE, 0xFF })) return 2; // UTF-16, little-endian if (IsMatch(stream, new byte[] { 0xFF, 0xFE })) return 2; // UTF-8 if (IsMatch(stream, new byte[] { 0xEF, 0xBB, 0xBF })) return 3; return 0; } private static bool IsMatch(Stream stream, byte[] match) { stream.Position = 0; var buffer = new byte[match.Length]; stream.Read(buffer, 0, buffer.Length); return !buffer.Where((t, i) => t != match[i]).Any(); } } 
0
source

All Articles