Find the first character that differs between the two lines

Given two lines with equal lengths, is there an elegant way to get the offset of the first other character?

An obvious solution would be the following:

for ($offset = 0; $offset < $length; ++$offset) { if ($str1[$offset] !== $str2[$offset]) { return $offset; } } 

But it does not look completely right for such a simple task.

+68
string php
Sep 19 '11 at 18:17
source share
4 answers

You can use the nice bitwise XOR ( ^ ) property to achieve this: basically, when you xor two lines together, characters that are the same will become null bytes ( "\0" ). So, if we xor two lines, we just need to find the position of the first non-empty byte using strspn :

 $position = strspn($string1 ^ $string2, "\0"); 

That's all. Therefore, consider an example:

 $string1 = 'foobarbaz'; $string2 = 'foobarbiz'; $pos = strspn($string1 ^ $string2, "\0"); printf( 'First difference at position %d: "%s" vs "%s"', $pos, $string1[$pos], $string2[$pos] ); 

This will output:

First difference in position 7: "a" vs "i"

So what needs to be done. It is very efficient because it uses only C functions and requires only one copy of string memory.

Edit: MultiByte solution on the same lines:

 function getCharacterOffsetOfDifference($str1, $str2, $encoding = 'UTF-8') { return mb_strlen( mb_strcut( $str1, 0, strspn($str1 ^ $str2, "\0"), $encoding ), $encoding ); } 

First, the difference at the byte level is found using the above method, and then the offset is mapped to the character level. This is done using the mb_strcut function, which is basically substr , but respects the boundaries of multibyte characters.

 var_dump(getCharacterOffsetOfDifference('foo', 'foa')); // 2 var_dump(getCharacterOffsetOfDifference('©oo', 'foa')); // 0 var_dump(getCharacterOffsetOfDifference('f©o', 'fªa')); // 1 

This is not as elegant as the first solution, but it is still single-line (and if you use the default encoding a little easier):

 return mb_strlen(mb_strcut($str1, 0, strspn($str1 ^ $str2, "\0"))); 
+168
Sep 19 '11 at 18:24
source share

If you convert a string to an array of one character to one byte, you can use the array comparison functions to compare strings.

You can achieve a similar result using the XOR method with the following.

 $string1 = 'foobarbaz'; $string2 = 'foobarbiz'; $array1 = str_split($string1); $array2 = str_split($string2); $result = array_diff_assoc($array1, $array2); $num_diff = count($result); $first_diff = key($result); echo "There are " . $num_diff . " differences between the two strings. <br />"; echo "The first difference between the strings is at position " . $first_diff . ". (Zero Index) '$string1[$first_diff]' vs '$string2[$first_diff]'."; 

Edit: multibyte solution

 $string1 = 'foorbarbaz'; $string2 = 'foobarbiz'; $array1 = preg_split('((.))u', $string1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY); $array2 = preg_split('((.))u', $string2, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY); $result = array_diff_assoc($array1, $array2); $num_diff = count($result); $first_diff = key($result); echo "There are " . $num_diff . " differences between the two strings.\n"; echo "The first difference between the strings is at position " . $first_diff . ". (Zero Index) '$string1[$first_diff]' vs '$string2[$first_diff]'.\n"; 
+15
04 Oct 2018-11-11T00:
source share

I wanted to add this as a comment on a better answer, but I don't have enough points.

 $string1 = 'foobarbaz'; $string2 = 'foobarbiz'; $pos = strspn($string1 ^ $string2, "\0"); if ($pos < min(strlen($string1), strlen($string2)){ printf( 'First difference at position %d: "%s" vs "%s"', $pos, $string1[$pos], $string2[$pos] ); } else if ($pos < strlen($string1)) { print 'String1 continues with' . substr($string1, $pos); } else if ($pos < strlen($string2)) { print 'String2 continues with' . substr($string2, $pos); } else { print 'String1 and String2 are equal'; } 
+4
Jan 14 2018-12-12T00:
source share
 string strpbrk ( string $haystack , string $char_list ) 

strpbrk () searches for the haystack string for char_list.

The return value is a $ haystack substring that begins with the first matching character. As an API function, it should be invisible. Then scroll once, looking for the zero offset value of the returned row to get the offset.

-5
06 Oct 2018-11-11T00:
source share



All Articles