unpack will be more efficient than split and ord because it does not need to create a bunch of temporary 1-character strings:
use utf8; my $str = '中國c';
A quick test shows it about 3 times faster than split+ord :
use utf8; use Benchmark 'cmpthese'; my $str = '中國中國中國中國中國中國中國中國中國中國中國中國中國中國c'; cmpthese(0, { 'unpack' => sub { my @codepoints = unpack 'U*', $str; }, 'split-map' => sub { my @codepoints = map { ord } split //, $str }, 'split-for' => sub { my @cp; for my $c (split(//, $str)) { push @cp, ord($c) } }, 'split-for2' => sub { my $cp; for my $c (split(//, $str)) { $cp = ord($c) } }, });
Results:
Rate split-map split-for split-for2 unpack split-map 85423/s -- -7% -32% -67% split-for 91950/s 8% -- -27% -64% split-for2 125550/s 47% 37% -- -51% unpack 256941/s 201% 179% 105% --
The difference is less pronounced with a shorter line, but unpack is still more than twice as fast. ( split-for2 bits are faster than the other because it does not create a list of code points.)
cjm
source share