Removing CRLF (0D 0A) from a string in Perl

I have a Perl script that uses an XML file on Linux, and sometimes in some node values ​​there is CRLF (Hex 0D0A, Dos new lines).

The system that creates the XML file writes everything as a single line, and it looks like it sometimes decides that it is too long and writes the CRLF to one of the data elements. Unfortunately, I can not do anything with the providing system.

I just need to remove them from the string before processing it.

I tried all kinds of regular expression replacements using perl char classes, hexadecimal values, all kinds and nothing seems to work.

I even run the input file through dos2unix before processing, and I still cannot get rid of erroneous characters.

Does anyone have any ideas?

Many thanks,

+6
regex perl
source share
3 answers

Typical, after a battle of about 2 hours, I solved it within 5 minutes with a question.

$output =~ s/[\x0A\x0D]//g; 

Finally got it.

+13
source share
 $output =~ tr/\x{d}\x{a}//d; 

These are both whitespace characters, so if terminators are always at the end, you can edit with

 $output =~ s/\s+\z//; 
+6
source share

Several variants:
1. Replace all occurrences of cr / lf with lf: $output =~ s/\r\n/\n/g; #instead of \r\n might want to use \012\015 $output =~ s/\r\n/\n/g; #instead of \r\n might want to use \012\015
2. Remove all trailing spaces: output =~ s/\s+$//g;
3. Slurp and split:

 #!/usr/bin/perl -w use strict; use LWP::Simple; sub main{ createfile(); outputfile(); } main(); sub createfile{ (my $file = $0)=~ s/\.pl/\.txt/; open my $fh, ">", $file; print $fh "1\n2\r\n3\n4\r\n5"; close $fh; } sub outputfile{ (my $filei = $0)=~ s/\.pl/\.txt/; (my $fileo = $0)=~ s/\.pl/out\.txt/; open my $fin, "<", $filei; local $/; # slurp the file my $text = <$fin>; # store the text my @text = split(/(?:\r\n|\n)/, $text); # split on dos or unix newlines close $fin; local $" = ", "; # change array scalar separator open my $fout, ">", $fileo; print $fout "@text"; # should output numbers separated by comma space close $fout; } 
+1
source share

All Articles