Source bytes of files from StreamReader, magic number detection

Question

Source bytes of files from StreamReader, magic number detection

I am trying to distinguish between "text files" and "binary" files, since I would really like to ignore files with "unreadable" content.

I have a file that I believe is a gzip archive. I am trying to ignore this file by detecting magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad ++, I see that the first three hexadecimal codes are 1f 8b 08 .

However, if I read the file using StreamReader , I am not sure how to get to the original bytes.

 using (var streamReader = new StreamReader(@"C:\file")) { char[] buffer = new char[10]; streamReader.Read(buffer, 0, 10); var s = new String(buffer); byte[] bytes = new byte[6]; System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6); var hex = BitConverter.ToString(bytes); var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray())); }

At the end of the using statement, I have the following variable values:

 hex: "1F-00-FD-FF-08-00" otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"

None of them begin with the hexadecimal values specified in Notepad ++.

Is it possible to get the source bytes from the result of reading a file through StreamReader ?

+6

c # .net encoding .net-4.0 magic-numbers

Tom hunter Feb 10 '13 at 12:22

source share

3 answers

You can not. StreamReader is made to read text, not binary. Use Stream directly to read bytes. In your case, FileStream .

To guess if the file is text or binary, you can read the first 4K in byte[] and interpret this.

Btw, you tried to force characters in bytes. This is not valid on a principle. I suggest you familiarize yourself with what Encoding : this is the only way to convert between characters and bytes in a semantically correct way.

+2

usr Feb 10 '13 at 12:27

source share

Usage (for pdf file):

 Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4));

GetMagicNumbers Method:

 private static string GetMagicNumbers(string filepath, int bytesCount) { // https://en.wikipedia.org/wiki/List_of_file_signatures byte[] buffer; using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read)) using (var reader = new BinaryReader(fs)) buffer = reader.ReadBytes(bytesCount); var hex = BitConverter.ToString(buffer); return hex.Replace("-", String.Empty).ToLower(); }

+2

dh_cgn Oct 14 '15 at 12:28

source share

Steve · Accepted Answer · 2013-02-10T12:36:33+0000

Your code is trying to change a binary buffer into a string. Strings are Unicode in NET, so two bytes are required. The result is a little unpredictable, as you can see.

Just use the BinaryReader and its ReadBytes method

 using(FileStream fs = new FileStream(@"C:\file", FileMode.Open, FileAccess.Read)) { using (var reader = new BinaryReader(fs, new ASCIIEncoding())) { byte[] buffer = new byte[10]; buffer = reader.ReadBytes(10); if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8) // you have a signature match.... } }

Source bytes of files from StreamReader, magic number detection

More articles: