How to open a Unicode file using Perl?

I use osql to run several sql scripts against the database, and then I need to look at the result file to see if there are any errors. The problem is that Perl does not seem to like that the result files are Unicode.

I wrote a small test script to test it, and the output will fail:

$file = shift; open OUTPUT, $file or die "Can't open $file: $!\n"; while (<OUTPUT>) { print $_; if (/Invalid|invalid|Cannot|cannot/) { push(@invalids, $file); print "invalid file - $inputfile - schedule for retry\n"; last; } } 

Any ideas? I tried to decode using decode_utf8 , but that doesn't make any difference. I also tried to set the encoding when opening the file.

I think the problem may be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in the text pane, it just tells me Unicode.

Edit: Using perl v5.8.8 Edit: Hex dump:

 file name: Admin_CI.User.sql.results mime type: 0000-0010: ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00 ..1.>... 2.>...M. 0000-0020: 73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00 sg..1. 5.0.0.7. 0000-0030: 2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00 ,...Le vel.. 0000-0032: 31 00 1. 
+7
file encoding perl unicode
source share
4 answers

The file is supposedly located in UCS2-LE (or UTF-16 ).

  C: \ Temp> notepad test.txt

 C: \ Temp> xxd test.txt
 0000000: fffe 5400 6800 6900 7300 2000 6900 7300 ..This .is
 0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00 .a.  .file .. 

When opening such a file for reading, you need to specify the encoding:

 #!/usr/bin/perl use strict; use warnings; my ($infile) = @ARGV; open my $in, '<:encoding(UCS-2le)', $infile or die "Cannot open '$infile': $!"; 

Note that fffe at the beginning is a specification .

+15
source share

The answer is in the open documentation, which also points to perluniintro . :)

 open my $fh, '<:encoding(UTF-16LE)', $file or die ...; 

You can get a list of encoding names supported by perl :

 % perl -MEncode -le "print for Encode->encodings(':all')" 

After that, you need to find out what the file encoding is. This is the same as opening any file with an encoding other than the default, regardless of whether it was defined by Unicode or not.

We have a chapter in Effective Perl Programming that goes through the details.

+7
source share

Try opening a file with the specified I / O level, for example:

 open OUTPUT, "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n"; 

See perldoc open for more details.

+4
source share
  # # ----------------------------------------------------------------------------- # Reads a file returns a sting , if second param is utf8 returns utf8 string # usage: # ( $ret , $msg , $str_file ) # = $objFileHandler->doReadFileReturnString ( $file , 'utf8' ) ; # or # ( $ret , $msg , $str_file ) # = $objFileHandler->doReadFileReturnString ( $file ) ; # ----------------------------------------------------------------------------- sub doReadFileReturnString { my $self = shift; my $file = shift; my $mode = shift ; my $msg = {} ; my $ret = 1 ; my $s = q{} ; $msg = " the file : $file does not exist !!!" ; cluck ( $msg ) unless -e $file ; $msg = " the file : $file is not actually a file !!!" ; cluck ( $msg ) unless -f $file ; $msg = " the file : $file is not readable !!!" ; cluck ( $msg ) unless -r $file ; $msg .= "can not read the file $file !!!"; return ( $ret , "$msg ::: $! !!!" , undef ) unless ((-e $file) && (-f $file) && (-r $file)); $msg = '' ; $s = eval { my $string = (); #slurp the file { local $/ = undef; if ( defined ( $mode ) && $mode eq 'utf8' ) { open FILE, "<:utf8", "$file " or cluck("failed to open \$file $file : $!"); $string = <FILE> ; die "did not find utf8 string in file: $file" unless utf8::valid ( $string ) ; } else { open FILE, "$file " or cluck "failed to open \$file $file : $!" ; $string = <FILE> ; } close FILE; } $string ; }; if ( $@ ) { $msg = $! . " " . $@ ; $ret = 1 ; $s = undef ; } else { $ret = 0 ; $msg = "ok for read file: $file" ; } return ( $ret , $msg , $s ) ; } #eof sub doReadFileReturnString 
0
source share

All Articles