Array of output bytes php differential version

Question

Array of output bytes php differential version

I use a function that transcribes strings into an array of bytes, I have this function in PHP and JavaScript, but both have different behaviors when I play these characters: 㬁愃膘 Ƙ ჰ 䚐 ⦀ 飠噋噋噋噋噋噋噋噋쌌 ص 䌠

How to make the results the same?

My code is:

function bytesFromWords($string) { $bytes = array(); $j = strlen($string); for($i = 0; $i < $j; $i++) { $char = ord(mb_substr($string, $i, 1)); $bytes[] = $char >> 8; $bytes[] = $char & 0xFF; } return $bytes; } echo bytesFromWords('㬁愃膘ƘჀ䚐⦀飠噋&ӡ๨㏃棱쌌ص䌠'); // result: 0,227,0,172,0,129,0,230,0,132,0,131,0,232,0,134,0,152,0,198,0,152,0,225,0,131,0,128,0,228,0,154,0,144,0,226,0,166,0,128,0,233,0,163,0,160,0,229,0,153,0,139,0,38,0,211,0,161,0,224,0,185,0,168,0,227,0,143,0,131,0,230,0,163,0,177,0,236,0,140,0,140,0,216,0,181,0,228,0,140,0,160 function bytesFromWords (string) { var bytes = []; for(var i = 0; i < string.length; i++) { var char = string.charCodeAt(i); bytes.push(char >>> 8); bytes.push(char & 0xFF); } return bytes; } console.log(bytesFromWords('㬁愃膘ƘჀ䚐⦀飠噋&ӡ๨㏃棱쌌ص䌠').toString()); // result: 59,1,97,3,129,152,1,152,16,192,70,144,41,128,152,224,86,75,0,38,4,225,14,104,51,195,104,241,195,12,6,53,67,32

+5

javascript arrays php

thebestclass Apr 21 '15 at 23:43

source share

3 answers

JavaScript uses UCS-2 encoding for Unicode strings, so to achieve the same ordinal representation, you first need to convert your string, for example. using mb_convert_encoding() or iconv() , if necessary.

The trick for quickly getting ordinal values from a string is unpack() .

 function bytesFromWords($string) { $x = mb_convert_encoding($string, 'UCS-2', 'UTF-8'); $data = unpack('C*', $x); return array_values($data); }

Demo

+2

Ja͢ck Apr 22 '15 at 1:46

source share

You use mb_substr() , which can return you multibyte strings (even if it is only one code).

But ord() doesn't like that ... it only accepts the first byte passed (not a character).

To get what you want, you just have to break the string and take single bytes:

 $bytes = str_split($string); foreach ($bytes as &$chr) { $chr = ord($chr); }

Yes, this is not what you have in Javascript. In Javascript, you get the identifier code via string.charCodeAt() , not a sequence of UTF-8 bytes.

The trick for getting bytes in Javascript will be (copied from fooobar.com/questions/210013 / ... ~ Jonathan Lonowski ):

 var utf8 = unescape(encodeURIComponent(string)); var arr = []; for (var i = 0; i < utf8.length; i++) { arr.push(utf8.charCodeAt(i)); }

But if you need a unicode id in PHP ... just do a quick search (for example, How to get the code point number for a given character in utf-8 string? )

+1

bwoebi Apr 22 '15 at 0:02

source share

Guilherme nascimento · Accepted Answer · 2015-04-22T01:25:25+0000

Questions:

strlen does not account for Unicode characters as expected.
ord does not work with unicode as expected.
chr does not work with unicode as expected.

Problem with `strlen`

'㬁愃膘ƘჀ䚐⦀飠噋&ӡ๨㏃棱쌌ص䌠'.length returns 17 and strlen('㬁愃膘ƘჀ䚐⦀飠噋&ӡ๨㏃棱쌌ص䌠') returns 46, use: to fix it

 $j = preg_match_all('/.{1}/us', $string, $data);

Problem with `ord`

Using '㬁'.charCodeAt(0) returns 15105, and ord('㬁') returns 227, to use fix:

 function unicode_ord($char) { list(, $ord) = unpack('N', mb_convert_encoding($char, 'UCS-4BE', 'UTF-8')); return $ord; }

_{Source: fooobar.com/questions/591213 / ...}

Problem with `chr`

Using String.fromCharCode(15104) returns 㬁 and chr(15104) return empty / blank, to use fix:

 function unicode_chr($u) { return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES'); }

_{Source: fooobar.com/questions/773629 / ...}

Full code:

 <?php function unicode_ord($char) { list(, $ord) = unpack('N', mb_convert_encoding($char, 'UCS-4BE', 'UTF-8')); return $ord; } function unicode_chr($u) { return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES'); } function bytesToWords($bytes) { $str = ''; $j = count($bytes); for($i = 0; $i < $j; $i += 2) { $char = $bytes[$i] << 8; if ($bytes[$i + 1]) { $char |= $bytes[$i + 1]; } $str .= unicode_chr($char); } return $str; } function bytesFromWords($string) { $bytes = array(); $j = preg_match_all('/.{1}/us', $string, $data); $data = $data[0]; foreach ($data as $char) { $char = unicode_ord($char); $bytes[] = $char >> 8; $bytes[] = $char & 0xFF; } return $bytes; } $data = bytesFromWords('㬁愃膘ƘჀ䚐⦀飠噋&ӡ๨㏃棱쌌ص䌠'); echo implode(', ', $data), '<br>'; echo bytesToWords($data);

Array of output bytes php differential version

Problem with strlen

Problem with ord

Problem with chr

More articles:

Problem with `strlen`

Problem with `ord`

Problem with `chr`