Cannot return the correct index

Well, therefore, firstly, I want to thank everyone for helping me so much in the last couple of weeks, here's another one !!!

I have a file and I use Regex to find out how many times the term "TamedName" appears. This is the easy part :)

I originally set it up this way

StreamReader ff = new StreamReader(fileName); String D = ff.ReadToEnd(); Regex rx = new Regex("TamedName"); foreach (Match Dino in rx.Matches(D)) { if (richTextBox2.Text == "") richTextBox2.Text += string.Format("{0} - {1:X} - {2}", Dino.Value, Dino.Index, ReadString(fileName, (uint)Dino.Index)); else richTextBox2.Text += string.Format("\n{0} - {1:X} - {2}", Dino.Value, Dino.Index, ReadString(fileName, (uint)Dino.Index)); } 

and he returned completely wrong index points, as pictured here

enter image description here

I'm pretty sure that I know why this is done, perhaps because converting everything from the binary to a string, obviously, not all characters will be translated to reset the actual index, so trying to link this back does not work at all ... Problem, I DO NOT know how to use Regex with a binary file and whether it translates correctly :(

I use Regex vs a simple search function because the difference between each occurrence of "TamedName" is too complex for the code in the function.

Hope you guys can help me with this :( I'm running out of ideas!

+1
regex streamreader
Jul 12 '15 at 3:33
source share
1 answer

The problem is that you are reading in a binary file, and the thread interpreter interprets it when it reads it into a Unicode string. It had to be treated as bytes.

My code is below. (Just like FYI, you need to enable unsafe compilation to compile code - this was for quickly finding a binary array)

Just for proper attribution, I took a byte version of IndexOf from this SO answer, Dylan Nicholson

 namespace ArkIndex { class Program { static void Main(string[] args) { string fileName = "TheIsland.ark"; string searchString = "TamedName"; byte[] bytes = LoadBytesFromFile(fileName); byte[] searchBytes = System.Text.ASCIIEncoding.Default.GetBytes(searchString); List<long> allNeedles = FindAllBytes(bytes, searchBytes); } static byte[] LoadBytesFromFile(string fileName) { FileStream fs = new FileStream(fileName, FileMode.Open); //BinaryReader br = new BinaryReader(fs); //StreamReader ff = new StreamReader(fileName); MemoryStream ms = new MemoryStream(); fs.CopyTo(ms); fs.Close(); return ms.ToArray(); } public static List<long> FindAllBytes(byte[] haystack, byte[] needle) { long currentOffset = 0; long offsetStep = needle.Length; long index = 0; List<long> allNeedleOffsets = new List<long>(); while((index = IndexOf(haystack,needle,currentOffset)) != -1L) { allNeedleOffsets.Add(index); currentOffset = index + offsetStep; } return allNeedleOffsets; } public static unsafe long IndexOf(byte[] haystack, byte[] needle, long startOffset = 0) { fixed (byte* h = haystack) fixed (byte* n = needle) { for (byte* hNext = h + startOffset, hEnd = h + haystack.LongLength + 1 - needle.LongLength, nEnd = n + needle.LongLength; hNext < hEnd; hNext++) for (byte* hInc = hNext, nInc = n; *nInc == *hInc; hInc++) if (++nInc == nEnd) return hNext - h; return -1; } } } } 
+2
Jul 13 '15 at 2:38
source share



All Articles