Replacing Quote-and-Multiple-Comma Separators in Perl

I have a very large file that I need to parse with Perl. The file format (which I cannot change) was originally written to allow opening the file in Excel as CSV. In particular, for this problem I need to replace the separator in each line with a pipe (|). This is usually not hard work, but I have a few problems (below), and as long as I have a solution, I wonder if there is a more efficient way to accomplish my task.

  • The data itself contains comments with commas (this means that I cannot find and replace the comma)
  • Each data value in the cell is encapsulated in quotation marks, but if the cell was empty, then quotation marks are not present.

Example line in a file:

"Foo Bar","More Foo","More Bar",,,,,"Yet More","Comma,Separated,Statement"

My current solution looks something like the code below. It works, but it seems inelegant and requires processing each line several times (which I want to avoid, since this file is very large).

# Change the delimiter
$line =~ s/",,,,,"/|||||/g;
$line =~ s/",,,,"/||||/g;
$line =~ s/",,,"/|||/g;
$line =~ s/",,"/||/g;
$line =~ s/","/|/g;

$line =~ s/^"//;     # Remove leading quotation mark
$line =~ s/"$//;     # Remove trailing quotation mark

Can someone help me find a faster and more elegant solution?

+4
source share
1 answer

Use Text :: CSV_XS . Read each line, get each of the values ​​and join the pipe. Let the module handle all formatting problems.

See the friedo answer code in Replace commas with pipes, but not commas enclosed in double quotes .

+3
source

All Articles