Source bytes of files from StreamReader, magic number detection

I am trying to distinguish between "text files" and "binary" files, since I would really like to ignore files with "unreadable" content.

I have a file that I believe is a gzip archive. I am trying to ignore this file by detecting magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad ++, I see that the first three hexadecimal codes are 1f 8b 08 .

However, if I read the file using StreamReader , I am not sure how to get to the original bytes.

 using (var streamReader = new StreamReader(@"C:\file")) { char[] buffer = new char[10]; streamReader.Read(buffer, 0, 10); var s = new String(buffer); byte[] bytes = new byte[6]; System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6); var hex = BitConverter.ToString(bytes); var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray())); } 

At the end of the using statement, I have the following variable values:

 hex: "1F-00-FD-FF-08-00" otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03" 

None of them begin with the hexadecimal values ​​specified in Notepad ++.

Is it possible to get the source bytes from the result of reading a file through StreamReader ?

+6
source share
3 answers

Your code is trying to change a binary buffer into a string. Strings are Unicode in NET, so two bytes are required. The result is a little unpredictable, as you can see.

Just use the BinaryReader and its ReadBytes method

 using(FileStream fs = new FileStream(@"C:\file", FileMode.Open, FileAccess.Read)) { using (var reader = new BinaryReader(fs, new ASCIIEncoding())) { byte[] buffer = new byte[10]; buffer = reader.ReadBytes(10); if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8) // you have a signature match.... } } 
+5
source

You can not. StreamReader is made to read text, not binary. Use Stream directly to read bytes. In your case, FileStream .

To guess if the file is text or binary, you can read the first 4K in byte[] and interpret this.

Btw, you tried to force characters in bytes. This is not valid on a principle. I suggest you familiarize yourself with what Encoding : this is the only way to convert between characters and bytes in a semantically correct way.

+2
source

Usage (for pdf file):

 Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4)); 

GetMagicNumbers Method:

 private static string GetMagicNumbers(string filepath, int bytesCount) { // https://en.wikipedia.org/wiki/List_of_file_signatures byte[] buffer; using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read)) using (var reader = new BinaryReader(fs)) buffer = reader.ReadBytes(bytesCount); var hex = BitConverter.ToString(buffer); return hex.Replace("-", String.Empty).ToLower(); } 
+2
source

All Articles