Efficient algorithm for finding the largest mileage of zeros in a binary string?

I am looking for an efficient algorithm to find the longest path of zeros in a binary string. My implementation is in Python 2.7, but all I need is the idea of โ€‹โ€‹an algorithm.

For example, when specifying '0010011010000', the function should return 4.

+5
source share
7 answers

I donโ€™t think there is anything better than a single pass on a line, counting the current length of the sequence (and updating the maximum) as you move.

If by โ€œbinary stringโ€ you mean unprocessed bits, you can read them one byte at a time and extract eight bits there (using shift or mask bits). This does not change the general algorithm or its complexity.

+8
source

Beat the obvious algorithm. The idea is that if you already have a run of length 0 and you see two 1s with no more than N positions, you do not need to check any positions between them. Therefore, check the sequence of candidates of zeros from the end, not the beginning. In the worst case scenario, you just check all the elements, as in the naive method, but on average it will be less.

So the algo goes like this (pseudo code, untested)

maxrun = 0 curpos = 0 runstart = 0 runend = 0 while curpos + maxrun < array.length broken = false for i = curpos + maxrun, i >= curpos and not broken, --i if array[i] == 1 broken = true curpos = i + 1 if not broken runstart = curpos # found a longer run of 0s # now extend it to the end maxrun++ curpos += maxrun while curpos < array.length and array[curpos] == 0 maxrun++ # ok found the 1 at the right end of the run # go to the next position and start over runend = curpos curpos++ # the longest run of 0s is [runstart, runend) 
+2
source

To find the largest mileage of consecutive zeros in a binary string, I suggest the following contour:

 int maxConsecutiveZeros(String binaryString) { int maxcount = Integer.MIN_VALUE; int currcount = 0; for(int i=0; i < binaryString.length(); i++) { if(binaryString.charAt(i) == '0') { currcount++; } else { maxcount = Math.max(currcount, maxcount); currcount = 0; } } return maxcount; } 

You must separately handle the case when binaryString ends in zero. Add this part to the provided outline, and you're done.

The complexity of this approach is linear along the length of the binary string.

+1
source

It depends on what you mean by efficiency.

If you intend to minimize the runtime, you will basically have to iterate over the character of the string character by character, analyzing the runs of consecutive zeros and tracking the longest, something like:

 def longRunZeros(s): big = 0 curr = 0 for c in s: if c == '0': curr += 1 else: if curr > big: big = curr curr = 0 if curr > big: big = curr return big print longRunZeros('0010011010000') 

If you are talking about program effectiveness, just do:

 def longRunZeros(s): return max(len(i) for i in s.split('1')) 

instead.

It will not necessarily work faster, but it will free you so that you have more time, perhaps to analyze whether you even need the download speed for this operation. It will also almost certainly be less error prone to its length.

As for speed, think about it. With line 25M, the character-by-character method takes 2.826 seconds of processor time for a million iterations. The split method takes 3.186 seconds for the same workload 1 .

So, if your line is much longer than 25M, or you need to do it much more than a million times, it will not make much difference, and I would prefer to choose a method that is easier for me as a developer.


Addendum: after I said that differential differential performance is not relevant here, I am a bit hypocritical to mention that the other method shown by John La Roy in the comment seems to be a little faster than mine.

But, for completeness, I will suffer from slings and arrows to indicate this:

 def longRunZeros(s): return len(max(s.split('1'))) 

It appears that the average is around 1.092 , which doubles the speed of the above character symbol.


1 These numbers are the average of five runs in my environment, and I cannot guarantee that they will be held anywhere else.

If you have ever participated in optimization efforts, you should know that it should be measured in your real environment, and not depending on which one is a random (but very beautiful) person on the Internet :-)

+1
source

A compiled regex can be quite fast, but I have not tested it. Nonetheless:

 >>> binstr = '0010011010000' >>> import re >>> zeros = re.compile(r'0+') >>> max(len(m) for m in zeros.findall(binstr)) 4 
+1
source

Well, as someone said, this is a String type, then I think you cannot escape O (| N |), which is I / O time. I just want to say that this is an integer, then you can do a little faster, for example:

 #include<bits/stdc++.h> using namespace std; int n; void binary(int x){ if(x){ binary(x>>1); if(x&1) putchar('1'); else putchar('0'); } } int main() { scanf("%d", &n); while(n){ binary(n); puts(""); int x = log2(n&-n); printf("Zero range: %d\n", x); n >>= (x+1); } return 0; } 

Ignore the print part, I think it is O (lg N)? ( Caution: since this handles an integer, zero space is not taken into account, this should not be complicated)

0
source

It's a little messy, and I know that if I thought a little more, I could improve the ending

 def solution(N): y = [int(x) for x in bin(N)[2:]] lst,zero = [],0 for r in y: if r == 0: zero +=1 else: if zero > 0: lst.append(zero) zero = 0 try: return max(lst) except Exception as E: return 0 

You don't need the last part and just return max (lst)

0
source

All Articles