How can Perl split a string into spaces, except that the space is at double dots?

I have the following line:

 StartProgram 1 "" C: \ Program Files \ ABC \ ABC XYZ "" CleanProgramTimeout 1 30

I need a regex to break this line, but ignore spaces in double quotes in Perl.

The following is what I tried, but it does not work.

(".*?"|\S+) 
+4
split regex perl
source share
4 answers
Once I also tried to invent a wheel and decided it myself.

Now I just use Text :: ParseWords and let this do the work for me.

+9
source share

Update: It looks like the fields are actually split into a tab, not spaces. If this is guaranteed, just divide by \t .

First, let's see why (".*?"|\S+) doesn’t work. In particular, see ".*?" This means that double quotation marks contain zero or more characters. Well, the field that gives you problems is ""C:\Program Files\ABC\ABC XYZ"" . Please note that each "" at the beginning and end of this field will correspond to ".*?" because "" consists of null characters surrounded by double quotes.

Better match as concretely as possible than split. So, if you have a configuration file with directives and a fixed format, match the regular expression that is as close to the format you are trying to match as possible.

Move the quotation marks outside the brackets if you do not want them.

 #!/usr/bin/perl use strict; use warnings; my $s = q{StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30}; my @parts = $s =~ m{\A(\w+) ([0-9]) (""[^"]+"") (\w+) ([0-9]) ([0-9]{2})}; use Data::Dumper; print Dumper \@parts; 

Output:

 $VAR1 = [ 'StartProgram', '1', '""C:\\Program Files\\ABC\\ABC XYZ""', 'CleanProgramTimeout', '1', '30' ]; 

In this vein, a script is involved here:

 #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @strings = split /\n/, <<'EO_TEXT'; StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30 StartProgram 1 c:\opt\perl CleanProgramTimeout 1 30 EO_TEXT my $re = qr{ (?<directive>StartProgram)\s+ (?<instance>[0-9][0-9]?)\s+ (?<path>"".+?""|\S+)\s+ (?<timeout_directive>CleanProgramTimeout)\s+ (?<timeout_instance>[0-9][0-9]?)\s+(?<timeout_seconds>[0-9]{2}) }x; for (@strings) { if ( $_ =~ $re ) { print Dumper \%+; } } 

Output:

 $VAR1 = { 'timeout_directive' => 'CleanProgramTimeout', 'timeout_seconds' => '30', 'path' => '""C:\\Program Files\\ABC\\ABC XYZ""', 'directive' => 'StartProgram', 'timeout_instance' => '1', 'instance' => '1' }; $VAR1 = { 'timeout_directive' => 'CleanProgramTimeout', 'timeout_seconds' => '30', 'path' => 'c:\\opt\\perl', 'directive' => 'StartProgram', 'timeout_instance' => '1', 'instance' => '1' }; 

Update: I cannot get Text::Balanced or Text::ParseWords to Text::ParseWords this correctly. I suspect the problem is with repeated quotes that limit the substring that should not be split. The following code is my best (not very good) attempt to solve a general problem by splitting and then selectively reassembling parts of the string.

 #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $s = q{StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30}; my $t = q{StartProgram 1 c:\opt\perl CleanProgramTimeout 1 30}; print Dumper parse_line($s); print Dumper parse_line($t); sub parse_line { my ($line) = @_; my @parts = split /(\s+)/, $line; my @real_parts; for (my $i = 0; $i < @parts; $i += 1) { unless ( $parts[$i] =~ /^""/ ) { push @real_parts, $parts[$i] if $parts[$i] =~ /\S/; next; } my $part; do { $part .= $parts[$i++]; } until ($part =~ /""$/); push @real_parts, $part; } return \@real_parts; } 
+4
source share
  my $x = 'StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30'; my @parts = $x =~ /("".*?""|[^\s]+?(?>\s|$))/g; 
0
source share
 my $str = 'StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30'; print "str:$str\n"; @A = $str =~ /(".+"|\S+)/g; foreach my $l (@A) { print "<$l>\n"; } 

This gives me:

 $ ./test.pl str:StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 130 <StartProgram> <1> <""C:\Program Files\ABC\ABC XYZ""> <CleanProgramTimeout> <1> <30> 
0
source share

All Articles