This is a really weird problem. It took me almost all day to destroy it to a small executable script file that fully demonstrates the problem.
Summary of the problem: I use XML :: Twig to pull a piece of data from an XML file, then I stick to this piece of data in the middle of another piece of data, let me call it the parent data. Parent data has this weird non-printable character at the beginning when I start. The provider provided the data, so I can’t control it. My problem is that after I insert a piece of data in the middle of the parent data, the final product has its new non-printable character from the very beginning, in addition to the one it started from the beginning. This new non-printable character contained neither parent data nor a detailed piece of data. I do not know where it came from, or how it gets into my data.
I doubt this is an XML :: Twig error, because line damage occurs while reading a line from a file descriptor in a while loop, but I was unsuccessful at recreating my problem when deleting XML :: Twig in my scripts, so I had to leave it .
This is my first experience with non-printable characters in strings that I am trying to process. Do I need to do something special instead of treating them like regular strings or something else?
I am using ActiveState Perl 5.10.1 and XML :: Twig 3.32 (latest version) and the Eclipse 3.5.1 IDE on Windows XP.
Here is a script that demonstrates the problem:
use strict;
use warnings;
use XML::Twig;
my $FALSE = 0;
my $TRUE = 1;
my $name = 'KurtsProgram';
my $task = 'MainTask';
my $hidden_char = "\xBF";
my $data = $hidden_char .
'(*********************************************
Data-File-Header-Junk
**********************************************)
PROGRAM MainProgram ()
END_PROGRAM
TASK SecondaryTask ()
END_TASK
TASK MainTask ()
MainProgram;
END_TASK
';
my $new_data = insertProgram( $name, $task, $data );
if ( $new_data =~ m/^\Q$hidden_char\E/ ) {
print "SUCCESS\n";
}
else {
print STDERR "ERROR: What happened?\n";
print STDERR "ORIGINAL: \n$data\n";
print STDERR "MODIFIED: \n$new_data\n";
}
sub insertProgram {
my ( $local_name, $local_task, $local_data ) = @_;
my $twig = new XML::Twig;
$twig->parse( '<?xml version="1.0"?>
<TemplateSet>
<PROGRAM>PROGRAM <Name>ProgramNameGoesHere</Name> ()
END_PROGRAM</PROGRAM>
<TASK>TASK <Name>TaskNameGoesHere</Name> ()
END_TASK</TASK>
</TemplateSet>
' );
my $program = $twig->root->first_child('PROGRAM');
$program->first_child('Name')->set_text($local_name);
my $insert = $program->text();
if ( $local_data =~ s/(\s+PROGRAM\s+[^\s]+\s+\()/\n\n $insert $1/ ) {
}
else {
return;
}
my $added_program_to_task = $FALSE;
my $found_start = $FALSE;
my $found_end = $FALSE;
my $new_data = "";
my $filehandle;
open( $filehandle, '<', \$local_data )
or die("Can't open string as a filehandle: $!");
while (defined (my $line = <$filehandle>)) {
if (
( !$found_start ) &&
( $line =~ m/\s+TASK\s+\Q$local_task\E\s+\(/ )
) {
$found_start = $TRUE;
}
if (
( $found_start ) && ( !$found_end ) &&
( $line =~ m/\s+END_TASK/ )
)
{
$found_end = $TRUE;
$line = " " . $local_name . ";\n" . $line;
$added_program_to_task = $TRUE;
}
$new_data = $new_data . $line;
}
close($filehandle);
if ($added_program_to_task) {
}
else {
return;
}
return $new_data;
}
When I run this script, I get the following output:
ERROR: What happened?
ORIGINAL:
¿(*********************************************
Data-File-Header-Junk
**********************************************)
PROGRAM MainProgram ()
END_PROGRAM
TASK SecondaryTask ()
END_TASK
TASK MainTask ()
MainProgram;
END_TASK
MODIFIED:
¿(*********************************************
Data-File-Header-Junk
**********************************************)
PROGRAM KurtsProgram ()
END_PROGRAM
PROGRAM MainProgram ()
END_PROGRAM
TASK SecondaryTask ()
END_TASK
TASK MainTask ()
MainProgram;
KurtsProgram;
END_TASK
You can see the extra character that has been added to the data front right below M in MODIFIED.