Sending binary data over a network in Perl

I am implementing a network client that sends messages to the server. Messages are streams of bytes, and the protocol requires me to send the length of each stream in advance.

If the message I give (with code using my module) is a byte string, then the length is given quite easily with length $string. But if it is a character string, I will need to massage it to get the raw bytes. What I'm doing now is basically this:

my $msg = shift;   # some message from calling code
my $bytes;
if ( utf8::is_utf8( $msg ) ) { 
    $bytes = Encode::encode( 'utf-8', $msg );
} else { 
    $bytes = $msg;
}

my $length = length $bytes;

Is this the right way to handle this? It seems to be working so far, but I haven't done any serious testing yet. What potential traps exist with this approach?

thank

+5
3

, . , , Unicode, , ( - , ).

, \xFF .

Unicode, Encode::encode_utf8() ( , Perl).

utf8::is_utf8() -— , ( ), . ( , , \x80 \xFF), , .

Ps. perldoc Encode Perl.

+4

:

use Encode qw( encode_utf8 );

sub pack_text {
   my ($text) = @_;
   my $bytes = encode_utf8($text);
   die "Text too long" if length($bytes) > 4294967295;
   return pack('N/a*', $bytes);
}

:

use Encode qw( decode_utf8 );

sub read_bytes {
   my ($fh, $to_read) = @_;
   my $buf = '';
   while ($to_read > 0) {
      my $bytes_read = read($fh, $buf, $to_read, length($buf));
      die $! if !defined($bytes_read);
      die "Premature EOF" if !$bytes_read;
      $to_read -= $bytes_read;
   }
   return $buf;
}

sub read_uint32 {
   my ($fh) = @_;
   return unpack('N', read_bytes($fh, 4));
}

sub read_text {
   my ($fh) = @_;
   return decode_utf8(read_bytes($fh, read_uint32($fh)));
}
+1

perldoc -f length , v5.8,

... , . , "do { use bytes; length(EXPR) }", . bytes.

length bytes:

length()usually deals with logical characters, not physical bytes. For how many bytes, a UTF-8 encoded string will be used, used "length(Encode::encode_utf8(EXPR))"(you have to “use” Encode “first.” See Encodeand perlunicode.

but I do not think that this discounts the decision do { use bytes; ... }.

0
source

All Articles