Replace thousands separators in csv with regex

I'm having problems trying to pull thousands separators from some currency values ​​into a set of files. Bad values ​​are separated by commas and double quotes. There are other values ​​that are <$ 1000 that do not present a problem.

An example of an existing file:

"12,345.67",12.34,"123,456.78",1.00,"123,456,789.12" 

Example of the desired file (thousands separators removed):

 "12345.67",12.34,"123456.78",1.00,"123456789.12" 

I found a regex expression to match numbers with delimiters that work fine, but I'm having problems with the -replace operator. I am replaced by replacement cost. I read about $ & and I wonder if I will use it here. I tried $ _, but that pulls ALL my commas. Should I use $ matches in some way?

Here is my code:

 $Files = Get-ChildItem *input.csv foreach ($file in $Files) { $file | Get-Content | #assume that I can't use -raw % {$_ -replace '"[\d]{1,3}(,[\d]{3})*(\.[\d]+)?"', ("$&" -replace ',','')} | #this is my problem out-file output.csv -append -encoding ascii } 
+5
source share
3 answers

You can try with this regex:

 ,(?=(\d{3},?)+(?:\.\d{1,3})?") 

See Live Demo or in PowerShell:

 % {$_ -replace ',(?=(\d{3},?)+(?:\.\d{1,3})?")','' } 

But this is more about a problem that regular expression can bring. For proper operation, use @briantist's answer, which is a clean way to do this.

+3
source

Tony Hinkle's comment is the answer: do not use a regular expression for this (at least not directly in the CSV file).

Your CSV is valid, so you have to parse it as such, work with objects (if you want, change the text), and then write a new CSV.

 Import-Csv -Path .\my.csv | ForEach-Object { $_ | ForEach-Object { $_ -replace ',','' } } | Export-Csv -Path .\my_new.csv 

(this code needs work, especially in the middle, since the row will have each column as a property, not an array, but a more complete version of your CSV will make it easier to demonstrate)

+5
source

I would use a simpler regex and use capture groups instead of all capture. I checked the regex with your input and did not find any problems.

% {$_ -replace '([\d]),([\d])','$1$2' }

eg. Find all commas with a number before and after (so weird mixed splits don't matter) and completely replace the comma.

This would have problems if your input had a script without this odd mix of quotes and without quotes.

+2
source

All Articles