Byte [] array pattern search

Question

Byte [] array pattern search

Anyone knows a good and efficient way to find / match a byte pattern in the byte [] array, and then return the positions.

for example

byte[] pattern = new byte[] {12,3,5,76,8,0,6,125}; byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}

+67

c # pattern-matching

Anders R Nov 12 '08 at 9:52

source share

27 answers

Jb Evain · Answer 1 · 2008-11-12 11:21

Can I suggest something that is not related to creating strings, copying arrays, or unsafe code:

 using System; using System.Collections.Generic; static class ByteArrayRocks { static readonly int [] Empty = new int [0]; public static int [] Locate (this byte [] self, byte [] candidate) { if (IsEmptyLocate (self, candidate)) return Empty; var list = new List<int> (); for (int i = 0; i < self.Length; i++) { if (!IsMatch (self, i, candidate)) continue; list.Add (i); } return list.Count == 0 ? Empty : list.ToArray (); } static bool IsMatch (byte [] array, int position, byte [] candidate) { if (candidate.Length > (array.Length - position)) return false; for (int i = 0; i < candidate.Length; i++) if (array [position + i] != candidate [i]) return false; return true; } static bool IsEmptyLocate (byte [] array, byte [] candidate) { return array == null || candidate == null || array.Length == 0 || candidate.Length == 0 || candidate.Length > array.Length; } static void Main () { var data = new byte [] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125 }; var pattern = new byte [] { 12, 3, 5, 76, 8, 0, 6, 125 }; foreach (var position in data.Locate (pattern)) Console.WriteLine (position); } }

Edit (by IAbstract) - moving post content here as this is not an answer

Out of curiosity, I created a small test with different answers.

Here are the results for millions of iterations:

 solution [Locate]: 00:00:00.7714027 solution [FindAll]: 00:00:03.5404399 solution [SearchBytePattern]: 00:00:01.1105190 solution [MatchBytePattern]: 00:00:03.0658212

YujiSoftware · Answer 2 · 2013-02-05 16:28

Use LINQ methods.

 public static IEnumerable<int> PatternAt(byte[] source, byte[] pattern) { for (int i = 0; i < source.Length; i++) { if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern)) { yield return i; } } }

Very simple!

VVS · Answer 3 · 2008-11-12 13:21

Use the efficient Boyer-Moore algorithm .

It is designed to look for strings containing strings, but you need a little imagination to project this onto byte arrays.

In general, the best answer: use any string search algorithm that you like :).

GoClimbColorado · Answer 4

I originally posted some old code that I used, but I was interested to learn about Jb Evain tests . I found that my decision was stupidly slow. It looks like bruno conde SearchBytePattern is the fastest. I could not understand why, especially since it uses Array.Copy and the Extension method. But there is evidence in Jb tests, so Bruno's fame.

I simplified the bits even further, so hopefully this will be the clearest and easiest solution. (All the hard work done by Bruno Conde). Improvements:

Buffer.BlockCopy
Array.indexOf <byte>
while loop instead of for loop
start index parameter

converted to extension method

 public static List<int> IndexOfSequence(this byte[] buffer, byte[] pattern, int startIndex) { List<int> positions = new List<int>(); int i = Array.IndexOf<byte>(buffer, pattern[0], startIndex); while (i >= 0 && i <= buffer.Length - pattern.Length) { byte[] segment = new byte[pattern.Length]; Buffer.BlockCopy(buffer, i, segment, 0, pattern.Length); if (segment.SequenceEqual<byte>(pattern)) positions.Add(i); i = Array.IndexOf<byte>(buffer, pattern[0], i + 1); } return positions; }

Note that the last statement in the while block should be i = Array.IndexOf<byte>(buffer, pattern[0], + 1); instead of i = Array.IndexOf<byte>(buffer, pattern[0], + pattern.Length); Look at Johan's comment. A simple test can prove that:

 byte[] pattern = new byte[] {1, 2}; byte[] toBeSearched = new byte[] { 1, 1, 2, 1, 12 };

With i = Array.IndexOf<byte>(buffer, pattern[0], + pattern.Length); nothing came back. i = Array.IndexOf<byte>(buffer, pattern[0], + 1); returns the correct result.

bruno conde · Answer 5 · 2008-11-12 10:55

My decision:

 class Program { public static void Main() { byte[] pattern = new byte[] {12,3,5,76,8,0,6,125}; byte[] toBeSearched = new byte[] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125}; List<int> positions = SearchBytePattern(pattern, toBeSearched); foreach (var item in positions) { Console.WriteLine("Pattern matched at pos {0}", item); } } static public List<int> SearchBytePattern(byte[] pattern, byte[] bytes) { List<int> positions = new List<int>(); int patternLength = pattern.Length; int totalLength = bytes.Length; byte firstMatchByte = pattern[0]; for (int i = 0; i < totalLength; i++) { if (firstMatchByte == bytes[i] && totalLength - i >= patternLength) { byte[] match = new byte[patternLength]; Array.Copy(bytes, i, match, 0, patternLength); if (match.SequenceEqual<byte>(pattern)) { positions.Add(i); i += patternLength - 1; } } } return positions; } }

Ing. · Answer 6 · 2016-07-28 01:29

This is my suggestion, simpler and faster:

 int Search(byte[] src, byte[] pattern) { int c = src.Length - pattern.Length + 1; int j; for (int i = 0; i < c; i++) { if (src[i] != pattern[0]) continue; for (j = pattern.Length - 1; j >= 1 && src[i + j] == pattern[j]; j--) ; if (j == 0) return i; } return -1; }

Matten · Answer 7 · 2012-08-08 14:41

I could not find the LINQ method / answer :-)

 /// <summary> /// Searches in the haystack array for the given needle using the default equality operator and returns the index at which the needle starts. /// </summary> /// <typeparam name="T">Type of the arrays.</typeparam> /// <param name="haystack">Sequence to operate on.</param> /// <param name="needle">Sequence to search for.</param> /// <returns>Index of the needle within the haystack or -1 if the needle isn't contained.</returns> public static IEnumerable<int> IndexOf<T>(this T[] haystack, T[] needle) { if ((needle != null) && (haystack.Length >= needle.Length)) { for (int l = 0; l < haystack.Length - needle.Length + 1; l++) { if (!needle.Where((data, index) => !haystack[l + index].Equals(data)).Any()) { yield return l; } } } }

Dylan Nicholson · Answer 8 · 2015-06-29 04:44

My version of Foubar answers above, which avoids searching at the end of the haystack and allows you to specify the initial offset. Suppose the needle is not empty or longer than a haystack.

 public static unsafe long IndexOf(this byte[] haystack, byte[] needle, long startOffset = 0) { fixed (byte* h = haystack) fixed (byte* n = needle) { for (byte* hNext = h + startOffset, hEnd = h + haystack.LongLength + 1 - needle.LongLength, nEnd = n + needle.LongLength; hNext < hEnd; hNext++) for (byte* hInc = hNext, nInc = n; *nInc == *hInc; hInc++) if (++nInc == nEnd) return hNext - h; return -1; } }

Alnitak · Answer 9 · 2008-11-12 13:34

Jb Evain's answer has:

  for (int i = 0; i < self.Length; i++) { if (!IsMatch (self, i, candidate)) continue; list.Add (i); }

and then the IsMatch function first checks to see if candidate goes beyond the length of the search array.

This would be more efficient if the for loop were encoded:

  for (int i = 0, n = self.Length - candidate.Length + 1; i < n; ++i) { if (!IsMatch (self, i, candidate)) continue; list.Add (i); }

at this point, you can also exclude the test from the very beginning of IsMatch , if you IsMatch into a contract with the prerequisites so that you never call it with "invalid" parameters. Note: fixed one-on-one error in 2019.

Foubar · Answer 10 · 2011-03-08 00:55

These are the simplest and fastest methods you can use, and there would be nothing faster than them. This is unsafe, but for this we use pointers for speed. Therefore, here I offer you my extension methods, which I use to search for a single, and a list of event indexes. I would like to say that here is the cleanest code.

  public static unsafe long IndexOf(this byte[] Haystack, byte[] Needle) { fixed (byte* H = Haystack) fixed (byte* N = Needle) { long i = 0; for (byte* hNext = H, hEnd = H + Haystack.LongLength; hNext < hEnd; i++, hNext++) { bool Found = true; for (byte* hInc = hNext, nInc = N, nEnd = N + Needle.LongLength; Found && nInc < nEnd; Found = *nInc == *hInc, nInc++, hInc++) ; if (Found) return i; } return -1; } } public static unsafe List<long> IndexesOf(this byte[] Haystack, byte[] Needle) { List<long> Indexes = new List<long>(); fixed (byte* H = Haystack) fixed (byte* N = Needle) { long i = 0; for (byte* hNext = H, hEnd = H + Haystack.LongLength; hNext < hEnd; i++, hNext++) { bool Found = true; for (byte* hInc = hNext, nInc = N, nEnd = N + Needle.LongLength; Found && nInc < nEnd; Found = *nInc == *hInc, nInc++, hInc++) ; if (Found) Indexes.Add(i); } return Indexes; } }

Compared with Locate, it is 1.2-1.4 times faster

Davy Landman · Answer 11 · 2008-11-12 13:14

I would use a solution that does the mapping by converting to a string ...

You should write a simple function that implements the Knuth-Morris-Pratt search algorithm . This will be the fastest simple algorithm you can use to find the right indexes. (You can use Boyer-Moore , but this will require more settings.

After you have optimized the algorithm, you can try to find other types of optimization. But you must start with the basics.

For example, the current “fastest” is the Locate solution from Jb Evian.

if you look at the core

  for (int i = 0; i < self.Length; i++) { if (!IsMatch (self, i, candidate)) continue; list.Add (i); }

After matching the algorithm, it will begin to find a match with i + 1, but you already know that the first possible match will be i + candidate. Length. Therefore, if you add,

 i += candidate.Length -2; // -2 instead of -1 because the i++ will add the last index

it will be much faster if you expect many occurrences of a subset in a subset. (Bruno Conde already does this in his decision)

But this is only half the KNP algorithm, you also need to add an additional parameter to the IsMatch method called numberOfValidMatches, which will be the output parameter.

this will result in the following:

 int validMatches = 0; if (!IsMatch (self, i, candidate, out validMatches)) { i += validMatches - 1; // -1 because the i++ will do the last one continue; }

and

 static bool IsMatch (byte [] array, int position, byte [] candidate, out int numberOfValidMatches) { numberOfValidMatches = 0; if (candidate.Length > (array.Length - position)) return false; for (i = 0; i < candidate.Length; i++) { if (array [position + i] != candidate [i]) return false; numberOfValidMatches++; } return true; }

A little refactoring, and you can use numberOfValidMatches as a loop variable and rewrite the Locate loop with while while to avoid -2 and -1. But I just wanted to clarify how you can add the KMP algorithm.

Davy Landman · Answer 12 · 2008-11-12 14:05

I created a new feature using the tips from my answer and the answer from Alnitak.

 public static List<Int32> LocateSubset(Byte[] superSet, Byte[] subSet) { if ((superSet == null) || (subSet == null)) { throw new ArgumentNullException(); } if ((superSet.Length < subSet.Length) || (superSet.Length == 0) || (subSet.Length == 0)) { return new List<Int32>(); } var result = new List<Int32>(); Int32 currentIndex = 0; Int32 maxIndex = superSet.Length - subSet.Length; while (currentIndex < maxIndex) { Int32 matchCount = CountMatches(superSet, currentIndex, subSet); if (matchCount == subSet.Length) { result.Add(currentIndex); } currentIndex++; if (matchCount > 0) { currentIndex += matchCount - 1; } } return result; } private static Int32 CountMatches(Byte[] superSet, int startIndex, Byte[] subSet) { Int32 currentOffset = 0; while (currentOffset < subSet.Length) { if (superSet[startIndex + currentOffset] != subSet[currentOffset]) { break; } currentOffset++; } return currentOffset; }

The only part I'm not very happy about is

  currentIndex++; if (matchCount > 0) { currentIndex += matchCount - 1; }

part ... I would like to use if if else to avoid -1, but this leads to better branch prediction (although I'm not sure if this will make a big difference).

Erik Brandstadmoen · Answer 13 · 2008-11-12 14:31

Why make simple difficult? This can be done in any language used for loops. Here is one from C #:

 using System;
 using System.Collections.Generic;

 namespace BinarySearch
 {
     class program
     {
         static void Main (string [] args)
         {
             byte [] pattern = new byte [] {12,3,5,76,8,0,6,125};
             byte [] toBeSearched = new byte [] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211, 
  122.22.4.7.89.76.64,12.3.5.76.8,0,6.125};

             List <int> occurences = findOccurences (toBeSearched, pattern);

             foreach (int occurence in occurences) {
                 Console.WriteLine ("Found match starting at 0-based index:" + occurence);
             }


         }

         static List <int> findOccurences (byte [] haystack, byte [] needle)
         {
             List <int> occurences = new List <int> ();

             for (int i = 0; i <haystack.Length; i ++)
             {
                 if (needle [0] == haystack [i])
                 {
                     bool found = true;
                     int j, k;
                     for (j = 0, k = i; j <needle.Length; j ++, k ++)
                     {
                         if (k> = haystack.Length || needle [j]! = haystack [k])
                         {
                             found = false;
                             break;
                         }
                     }
                     if (found)
                     {
                         occurences.Add (i - 1);
                         i = k;
                     }
                 }
             }
             return occurences;
         }
     }
 }

Anders R · Answer 14 · 2008-11-12 14:44

thanks for taking the time ...

This is the code I used / tested before I asked my question ... The reason I ask this question was because I am sure that I am not using the optimal code for this ... so thanks again for taking the time!

  private static int CountPatternMatches(byte[] pattern, byte[] bytes) { int counter = 0; for (int i = 0; i < bytes.Length; i++) { if (bytes[i] == pattern[0] && (i + pattern.Length) < bytes.Length) { for (int x = 1; x < pattern.Length; x++) { if (pattern[x] != bytes[x+i]) { break; } if (x == pattern.Length -1) { counter++; i = i + pattern.Length; } } } } return counter; }

Anyone who sees errors in my code? Is this considered a hacker approach? I tried almost every sample that you guys posted, and I seem to be getting some changes in the match results. I ran my tests with a ~ 10 MB byte array as a toBeSearched array.

Steve Hiner · Answer 15 · 2009-08-13 09:51

Speed is not everything. Did you check them for consistency?

I have not tested all the code listed here. I tested my own code (which was not completely consistent, I admit) and IndexOfSequence. I found that for many tests, IndexOfSequence was pretty fast than my code, but with repeated testing, I found that it was less consistent. In particular, it seems to have the most difficult problem finding patterns at the end of the array, but sometimes they will skip them in the middle of the array.

My test code is not designed to improve performance, I just wanted to have a bunch of random data with some known lines inside. This test pattern is approximately similar to the border marker in the download stream of the http form. This is what I was looking for when I came across this code, so I decided that I would check it for the data that I would be looking for. It seems like the longer the template, the more often IndexOfSequence will skip the value.

 private static void TestMethod() { Random rnd = new Random(DateTime.Now.Millisecond); string Pattern = "-------------------------------65498495198498"; byte[] pattern = Encoding.ASCII.GetBytes(Pattern); byte[] testBytes; int count = 3; for (int i = 0; i < 100; i++) { StringBuilder TestString = new StringBuilder(2500); TestString.Append(Pattern); byte[] buf = new byte[1000]; rnd.NextBytes(buf); TestString.Append(Encoding.ASCII.GetString(buf)); TestString.Append(Pattern); rnd.NextBytes(buf); TestString.Append(Encoding.ASCII.GetString(buf)); TestString.Append(Pattern); testBytes = Encoding.ASCII.GetBytes(TestString.ToString()); List<int> idx = IndexOfSequence(ref testBytes, pattern, 0); if (idx.Count != count) { Console.Write("change from {0} to {1} on iteration {2}: ", count, idx.Count, i); foreach (int ix in idx) { Console.Write("{0}, ", ix); } Console.WriteLine(); count = idx.Count; } } Console.WriteLine("Press ENTER to exit"); Console.ReadLine(); }

(obviously, I converted IndexOfSequence from the extension back to the regular method for this testing)

Here's an example of doing my output:

 change from 3 to 2 on iteration 1: 0, 2090, change from 2 to 3 on iteration 2: 0, 1045, 2090, change from 3 to 2 on iteration 3: 0, 1045, change from 2 to 3 on iteration 4: 0, 1045, 2090, change from 3 to 2 on iteration 6: 0, 2090, change from 2 to 3 on iteration 7: 0, 1045, 2090, change from 3 to 2 on iteration 11: 0, 2090, change from 2 to 3 on iteration 12: 0, 1045, 2090, change from 3 to 2 on iteration 14: 0, 2090, change from 2 to 3 on iteration 16: 0, 1045, 2090, change from 3 to 2 on iteration 17: 0, 1045, change from 2 to 3 on iteration 18: 0, 1045, 2090, change from 3 to 1 on iteration 20: 0, change from 1 to 3 on iteration 21: 0, 1045, 2090, change from 3 to 2 on iteration 22: 0, 2090, change from 2 to 3 on iteration 23: 0, 1045, 2090, change from 3 to 2 on iteration 24: 0, 2090, change from 2 to 3 on iteration 25: 0, 1045, 2090, change from 3 to 2 on iteration 26: 0, 2090, change from 2 to 3 on iteration 27: 0, 1045, 2090, change from 3 to 2 on iteration 43: 0, 1045, change from 2 to 3 on iteration 44: 0, 1045, 2090, change from 3 to 2 on iteration 48: 0, 1045, change from 2 to 3 on iteration 49: 0, 1045, 2090, change from 3 to 2 on iteration 50: 0, 2090, change from 2 to 3 on iteration 52: 0, 1045, 2090, change from 3 to 2 on iteration 54: 0, 1045, change from 2 to 3 on iteration 57: 0, 1045, 2090, change from 3 to 2 on iteration 62: 0, 1045, change from 2 to 3 on iteration 63: 0, 1045, 2090, change from 3 to 2 on iteration 72: 0, 2090, change from 2 to 3 on iteration 73: 0, 1045, 2090, change from 3 to 2 on iteration 75: 0, 2090, change from 2 to 3 on iteration 76: 0, 1045, 2090, change from 3 to 2 on iteration 78: 0, 1045, change from 2 to 3 on iteration 79: 0, 1045, 2090, change from 3 to 2 on iteration 81: 0, 2090, change from 2 to 3 on iteration 82: 0, 1045, 2090, change from 3 to 2 on iteration 85: 0, 2090, change from 2 to 3 on iteration 86: 0, 1045, 2090, change from 3 to 2 on iteration 89: 0, 2090, change from 2 to 3 on iteration 90: 0, 1045, 2090, change from 3 to 2 on iteration 91: 0, 2090, change from 2 to 1 on iteration 92: 0, change from 1 to 3 on iteration 93: 0, 1045, 2090, change from 3 to 1 on iteration 99: 0,

I'm not going to choose IndexOfSequence, it just happened that I started working with today. At the end of the day, I noticed that there were apparently no templates in the data, so today I wrote my own template. It's not that fast though. I'm going to tweak it a bit more to see if I can get it 100% before posting it.

I just wanted to remind everyone that they should check such things to make sure they give good, repeatable results before entrusting them with production code.

Sorin S. · Answer 16 · 2009-11-27 09:46

I tried various solutions and ended up modifying SearchBytePattern. I tested the 30k sequence and it is fast :)

  static public int SearchBytePattern(byte[] pattern, byte[] bytes) { int matches = 0; for (int i = 0; i < bytes.Length; i++) { if (pattern[0] == bytes[i] && bytes.Length - i >= pattern.Length) { bool ismatch = true; for (int j = 1; j < pattern.Length && ismatch == true; j++) { if (bytes[i + j] != pattern[j]) ismatch = false; } if (ismatch) { matches++; i += pattern.Length - 1; } } } return matches; }

Let me know your thoughts.

Jay · Answer 17 · 2011-07-04 07:10

Here is the solution I came up with. I included notes that I found along the way with the implementation. It can coincide forward, backward and with various (w / d) amounts, for example, direction; starting at any offset in the haystack.

Any input would be awesome!

  /// <summary> /// Matches a byte array to another byte array /// forwards or reverse /// </summary> /// <param name="a">byte array</param> /// <param name="offset">start offset</param> /// <param name="len">max length</param> /// <param name="b">byte array</param> /// <param name="direction">to move each iteration</param> /// <returns>true if all bytes match, otherwise false</returns> internal static bool Matches(ref byte[] a, int offset, int len, ref byte[] b, int direction = 1) { #region Only Matched from offset Within a and b, could not differ, eg if you wanted to mach in reverse for only part of a in some of b that would not work //if (direction == 0) throw new ArgumentException("direction"); //for (; offset < len; offset += direction) if (a[offset] != b[offset]) return false; //return true; #endregion //Will match if b contains len of a and return aa index of positive value return IndexOfBytes(ref a, ref offset, len, ref b, len) != -1; } ///Here is the Implementation code /// <summary> /// Swaps two integers without using a temporary variable /// </summary> /// <param name="a"></param> /// <param name="b"></param> internal static void Swap(ref int a, ref int b) { a ^= b; b ^= a; a ^= b; } /// <summary> /// Swaps two bytes without using a temporary variable /// </summary> /// <param name="a"></param> /// <param name="b"></param> internal static void Swap(ref byte a, ref byte b) { a ^= b; b ^= a; a ^= b; } /// <summary> /// Can be used to find if a array starts, ends spot Matches or compltely contains a sub byte array /// Set checkLength to the amount of bytes from the needle you want to match, start at 0 for forward searches start at hayStack.Lenght -1 for reverse matches /// </summary> /// <param name="a">Needle</param> /// <param name="offset">Start in Haystack</param> /// <param name="len">Length of required match</param> /// <param name="b">Haystack</param> /// <param name="direction">Which way to move the iterator</param> /// <returns>Index if found, otherwise -1</returns> internal static int IndexOfBytes(ref byte[] needle, ref int offset, int checkLength, ref byte[] haystack, int direction = 1) { //If the direction is == 0 we would spin forever making no progress if (direction == 0) throw new ArgumentException("direction"); //Cache the length of the needle and the haystack, setup the endIndex for a reverse search int needleLength = needle.Length, haystackLength = haystack.Length, endIndex = 0, workingOffset = offset; //Allocate a value for the endIndex and workingOffset //If we are going forward then the bound is the haystackLength if (direction >= 1) endIndex = haystackLength; #region [Optomization - Not Required] //{ //I though this was required for partial matching but it seems it is not needed in this form //workingOffset = needleLength - checkLength; //} #endregion else Swap(ref workingOffset, ref endIndex); #region [Optomization - Not Required] //{ //Otherwise we are going in reverse and the endIndex is the needleLength - checkLength //I though the length had to be adjusted but it seems it is not needed in this form //endIndex = needleLength - checkLength; //} #endregion #region [Optomized to above] //Allocate a value for the endIndex //endIndex = direction >= 1 ? haystackLength : needleLength - checkLength, //Determine the workingOffset //workingOffset = offset > needleLength ? offset : needleLength; //If we are doing in reverse swap the two //if (workingOffset > endIndex) Swap(ref workingOffset, ref endIndex); //Else we are going in forward direction do the offset is adjusted by the length of the check //else workingOffset -= checkLength; //Start at the checkIndex (workingOffset) every search attempt #endregion //Save the checkIndex (used after the for loop is done with it to determine if the match was checkLength long) int checkIndex = workingOffset; #region [For Loop Version] ///Optomized with while (single op) ///for (int checkIndex = workingOffset; checkIndex < endIndex; offset += direction, checkIndex = workingOffset) ///{ ///Start at the checkIndex /// While the checkIndex < checkLength move forward /// If NOT (the needle at the checkIndex matched the haystack at the offset + checkIndex) BREAK ELSE we have a match continue the search /// for (; checkIndex < checkLength; ++checkIndex) if (needle[checkIndex] != haystack[offset + checkIndex]) break; else continue; /// If the match was the length of the check /// if (checkIndex == checkLength) return offset; //We are done matching ///} #endregion //While the checkIndex < endIndex while (checkIndex < endIndex) { for (; checkIndex < checkLength; ++checkIndex) if (needle[checkIndex] != haystack[offset + checkIndex]) break; else continue; //If the match was the length of the check if (checkIndex == checkLength) return offset; //We are done matching //Move the offset by the direction, reset the checkIndex to the workingOffset offset += direction; checkIndex = workingOffset; } //We did not have a match with the given options return -1; }

eocron · Answer 18 · 2016-06-27 07:48

You can use ORegex:

 var oregex = new ORegex<byte>("{0}{1}{2}", x=> x==12, x=> x==3, x=> x==5); var toSearch = new byte[]{1,1,12,3,5,1,12,3,5,5,5,5}; var found = oregex.Matches(toSearch);

Two matches will be found:

 i:2;l:3 i:6;l:3

Difficulty: O (n * m) in the worst case, in real life it is O (n) due to the internal finite machine. In some cases, it is faster than .NET Regex. , .

Constantin · Answer 19 · 2008-11-12 11:27

( ) . , /-1 , /ASCII /UTF 8.

, It Works (tm) ( 0x80-0xff) .

 using System; using System.Collections.Generic; using System.Text; using System.Text.RegularExpressions; class C { public static void Main() { byte[] data = {0, 100, 0, 255, 100, 0, 100, 0, 255}; byte[] pattern = {0, 255}; foreach (int i in FindAll(data, pattern)) { Console.WriteLine(i); } } public static IEnumerable<int> FindAll( byte[] haystack, byte[] needle ) { // bytes <-> latin-1 conversion is lossless Encoding latin1 = Encoding.GetEncoding("iso-8859-1"); string sHaystack = latin1.GetString(haystack); string sNeedle = latin1.GetString(needle); for (Match m = Regex.Match(sHaystack, Regex.Escape(sNeedle)); m.Success; m = m.NextMatch()) { yield return m.Index; } } }

Victor · Answer 20 · 2011-08-06 03:15

Boyer Moore, . # .

EyeCode Inc.

 class Program { static void Main(string[] args) { byte[] text = new byte[] {12,3,5,76,8,0,6,125,23,36,43,76,125,56,34,234,12,4,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,123}; byte[] pattern = new byte[] {12,3,5,76,8,0,6,125}; BoyerMoore tmpSearch = new BoyerMoore(pattern,text); Console.WriteLine(tmpSearch.Match()); Console.ReadKey(); } public class BoyerMoore { private static int ALPHABET_SIZE = 256; private byte[] text; private byte[] pattern; private int[] last; private int[] match; private int[] suffix; public BoyerMoore(byte[] pattern, byte[] text) { this.text = text; this.pattern = pattern; last = new int[ALPHABET_SIZE]; match = new int[pattern.Length]; suffix = new int[pattern.Length]; } /** * Searches the pattern in the text. * returns the position of the first occurrence, if found and -1 otherwise. */ public int Match() { // Preprocessing ComputeLast(); ComputeMatch(); // Searching int i = pattern.Length - 1; int j = pattern.Length - 1; while (i < text.Length) { if (pattern[j] == text[i]) { if (j == 0) { return i; } j--; i--; } else { i += pattern.Length - j - 1 + Math.Max(j - last[text[i]], match[j]); j = pattern.Length - 1; } } return -1; } /** * Computes the function last and stores its values in the array last. * last(Char ch) = the index of the right-most occurrence of the character ch * in the pattern; * -1 if ch does not occur in the pattern. */ private void ComputeLast() { for (int k = 0; k < last.Length; k++) { last[k] = -1; } for (int j = pattern.Length-1; j >= 0; j--) { if (last[pattern[j]] < 0) { last[pattern[j]] = j; } } } /** * Computes the function match and stores its values in the array match. * match(j) = min{ s | 0 < s <= j && p[js]!=p[j] * && p[j-s+1]..p[ms-1] is suffix of p[j+1]..p[m-1] }, * if such s exists, else * min{ s | j+1 <= s <= m * && p[0]..p[ms-1] is suffix of p[j+1]..p[m-1] }, * if such s exists, * m, otherwise, * where p is the pattern and m is its length. */ private void ComputeMatch() { /* Phase 1 */ for (int j = 0; j < match.Length; j++) { match[j] = match.Length; } //O(m) ComputeSuffix(); //O(m) /* Phase 2 */ //Uses an auxiliary array, backwards version of the KMP failure function. //suffix[i] = the smallest j > i st p[j..m-1] is a prefix of p[i..m-1], //if there is no such j, suffix[i] = m //Compute the smallest shift s, such that 0 < s <= j and //p[js]!=p[j] and p[j-s+1..ms-1] is suffix of p[j+1..m-1] or j == m-1}, // if such s exists, for (int i = 0; i < match.Length - 1; i++) { int j = suffix[i + 1] - 1; // suffix[i+1] <= suffix[i] + 1 if (suffix[i] > j) { // therefore pattern[i] != pattern[j] match[j] = j - i; } else {// j == suffix[i] match[j] = Math.Min(j - i + match[i], match[j]); } } /* Phase 3 */ //Uses the suffix array to compute each shift s such that //p[0..ms-1] is a suffix of p[j+1..m-1] with j < s < m //and stores the minimum of this shift and the previously computed one. if (suffix[0] < pattern.Length) { for (int j = suffix[0] - 1; j >= 0; j--) { if (suffix[0] < match[j]) { match[j] = suffix[0]; } } { int j = suffix[0]; for (int k = suffix[j]; k < pattern.Length; k = suffix[k]) { while (j < k) { if (match[j] > k) { match[j] = k; } j++; } } } } } /** * Computes the values of suffix, which is an auxiliary array, * backwards version of the KMP failure function. * * suffix[i] = the smallest j > i st p[j..m-1] is a prefix of p[i..m-1], * if there is no such j, suffix[i] = m, ie * p[suffix[i]..m-1] is the longest prefix of p[i..m-1], if suffix[i] < m. */ private void ComputeSuffix() { suffix[suffix.Length-1] = suffix.Length; int j = suffix.Length - 1; for (int i = suffix.Length - 2; i >= 0; i--) { while (j < suffix.Length - 1 && !pattern[j].Equals(pattern[i])) { j = suffix[j + 1] - 1; } if (pattern[j] == pattern[i]) { j--; } suffix[i] = j + 1; } } } }

Ravi · Answer 21 · 2012-04-13 11:47

, , : ( )

 private static int findMatch(byte[] data, byte[] pattern) { if(pattern.length > data.length){ return -1; } for(int i = 0; i<data.length ;){ int j; for(j=0;j<pattern.length;j++){ if(pattern[j]!=data[i]) break; i++; } if(j==pattern.length){ System.out.println("Pattern found at : "+(i - pattern.length )); return i - pattern.length ; } if(j!=0)continue; i++; } return -1; }

ApeShoes · Answer 22 · 2013-12-13 00:29

, O (n) .

. , , .

  static void Main(string[] args) { // 1 1 1 1 1 1 1 1 1 1 2 2 2 // 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 byte[] buffer = new byte[] { 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 5, 5, 0, 5, 5, 1, 2 }; byte[] beginPattern = new byte[] { 1, 0, 2 }; byte[] middlePattern = new byte[] { 8, 9, 10 }; byte[] endPattern = new byte[] { 9, 10, 11 }; byte[] wholePattern = new byte[] { 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; byte[] noMatchPattern = new byte[] { 7, 7, 7 }; int beginIndex = ByteArrayPatternIndex(buffer, beginPattern); int middleIndex = ByteArrayPatternIndex(buffer, middlePattern); int endIndex = ByteArrayPatternIndex(buffer, endPattern); int wholeIndex = ByteArrayPatternIndex(buffer, wholePattern); int noMatchIndex = ByteArrayPatternIndex(buffer, noMatchPattern); } /// <summary> /// Returns the index of the first occurrence of a byte array within another byte array /// </summary> /// <param name="buffer">The byte array to be searched</param> /// <param name="pattern">The byte array that contains the pattern to be found</param> /// <returns>If buffer contains pattern then the index of the first occurrence of pattern within buffer; otherwise, -1</returns> public static int ByteArrayPatternIndex(byte[] buffer, byte[] pattern) { if (buffer != null && pattern != null && pattern.Length <= buffer.Length) { int resumeIndex; for (int i = 0; i <= buffer.Length - pattern.Length; i++) { if (buffer[i] == pattern[0]) // Current byte equals first byte of pattern { resumeIndex = 0; for (int x = 1; x < pattern.Length; x++) { if (buffer[i + x] == pattern[x]) { if (x == pattern.Length - 1) // Matched the entire pattern return i; else if (resumeIndex == 0 && buffer[i + x] == pattern[0]) // The current byte equals the first byte of the pattern so start here on the next outer loop iteration resumeIndex = i + x; } else { if (resumeIndex > 0) i = resumeIndex - 1; // The outer loop iterator will increment so subtract one else if (x > 1) i += (x - 1); // Advance the outer loop variable since we already checked these bytes break; } } } } } return -1; } /// <summary> /// Returns the indexes of each occurrence of a byte array within another byte array /// </summary> /// <param name="buffer">The byte array to be searched</param> /// <param name="pattern">The byte array that contains the pattern to be found</param> /// <returns>If buffer contains pattern then the indexes of the occurrences of pattern within buffer; otherwise, null</returns> /// <remarks>A single byte in the buffer array can only be part of one match. For example, if searching for 1,2,1 in 1,2,1,2,1 only zero would be returned.</remarks> public static int[] ByteArrayPatternIndex(byte[] buffer, byte[] pattern) { if (buffer != null && pattern != null && pattern.Length <= buffer.Length) { List<int> indexes = new List<int>(); int resumeIndex; for (int i = 0; i <= buffer.Length - pattern.Length; i++) { if (buffer[i] == pattern[0]) // Current byte equals first byte of pattern { resumeIndex = 0; for (int x = 1; x < pattern.Length; x++) { if (buffer[i + x] == pattern[x]) { if (x == pattern.Length - 1) // Matched the entire pattern indexes.Add(i); else if (resumeIndex == 0 && buffer[i + x] == pattern[0]) // The current byte equals the first byte of the pattern so start here on the next outer loop iteration resumeIndex = i + x; } else { if (resumeIndex > 0) i = resumeIndex - 1; // The outer loop iterator will increment so subtract one else if (x > 1) i += (x - 1); // Advance the outer loop variable since we already checked these bytes break; } } } } if (indexes.Count > 0) return indexes.ToArray(); } return null; }

Mehmet · Answer 23 · 2017-01-01 08:19

. . .

 public int Search3(byte[] src, byte[] pattern) { int index = -1; for (int i = 0; i < src.Length; i++) { if (src[i] != pattern[0]) { continue; } else { bool isContinoue = true; for (int j = 1; j < pattern.Length; j++) { if (src[++i] != pattern[j]) { isContinoue = true; break; } if(j == pattern.Length - 1) { isContinoue = false; } } if ( ! isContinoue) { index = i-( pattern.Length-1) ; break; } } } return index; }

Codehack · Answer 24 · 2018-08-17 09:37

. , . ( , ).

, , .

, , . ( ), (). , , . , , , ( ), , , .

, :

  public unsafe int IndexOfPattern(byte[] src, byte[] pattern) { fixed(byte *srcPtr = &src[0]) fixed (byte* patternPtr = &pattern[0]) { for (int x = 0; x < src.Length; x++) { byte currentValue = *(srcPtr + x); if (currentValue != *patternPtr) continue; bool match = false; for (int y = 0; y < pattern.Length; y++) { byte tempValue = *(srcPtr + x + y); if (tempValue != *(patternPtr + y)) { match = false; break; } match = true; } if (match) return x; } } return -1; }

:

  public int IndexOfPatternSafe(byte[] src, byte[] pattern) { for (int x = 0; x < src.Length; x++) { byte currentValue = src[x]; if (currentValue != pattern[0]) continue; bool match = false; for (int y = 0; y < pattern.Length; y++) { byte tempValue = src[x + y]; if (tempValue != pattern[y]) { match = false; break; } match = true; } if (match) return x; } return -1; }

spludlow · Answer 25 · 2019-05-14 07:39

, :

  public static long FindBinaryPattern(byte[] data, byte[] pattern) { using (MemoryStream stream = new MemoryStream(data)) { return FindBinaryPattern(stream, pattern); } } public static long FindBinaryPattern(string filename, byte[] pattern) { using (FileStream stream = new FileStream(filename, FileMode.Open)) { return FindBinaryPattern(stream, pattern); } } public static long FindBinaryPattern(Stream stream, byte[] pattern) { byte[] buffer = new byte[1024 * 1024]; int patternIndex = 0; int read; while ((read = stream.Read(buffer, 0, buffer.Length)) > 0) { for (int bufferIndex = 0; bufferIndex < read; ++bufferIndex) { if (buffer[bufferIndex] == pattern[patternIndex]) { ++patternIndex; if (patternIndex == pattern.Length) return stream.Position - (read - bufferIndex) - pattern.Length + 1; } else { patternIndex = 0; } } } return -1; }

, .

Eugene Yokota · Answer 26 · 2008-11-12 10:00

You can put an array of bytes in a String and match with IndexOf. Or you can at least reuse existing algorithms when matching strings.

  [STAThread] static void Main(string[] args) { byte[] pattern = new byte[] {12,3,5,76,8,0,6,125}; byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}; string needle, haystack; unsafe { fixed(byte * p = pattern) { needle = new string((SByte *) p, 0, pattern.Length); } // fixed fixed (byte * p2 = toBeSearched) { haystack = new string((SByte *) p2, 0, toBeSearched.Length); } // fixed int i = haystack.IndexOf(needle, 0); System.Console.Out.WriteLine(i); } }

Tamir · Answer 27 · 2008-11-12 10:22

toBeSearched.Except (pattern) will return you the differences toBeSearched.Intersect (pattern) will create a set of intersections As a rule, you should study the advanced methods in Linq extensions

Byte [] array pattern search

More articles: