PowerShell Multi-Line Substitution Efficiency

I am trying to replace 600 different lines in a very large 30Mb + text file. I am now creating a script that does this; after this Question :

Scenario:

$string = gc $filePath $string | % { $_ -replace 'something0','somethingelse0' ' -replace 'something1','somethingelse1' ' -replace 'something2','somethingelse2' ' -replace 'something3','somethingelse3' ' -replace 'something4','somethingelse4' ' -replace 'something5','somethingelse5' ' ... (600 More Lines...) ... } $string | ac "C:\log.txt" 

But since each line will be checked 600 times, and there will be more than 150 lines in a text file, 000+ means that there is a lot of processing time.

Is there a better alternative to make this more efficient?

+8
regex replace perl powershell text-files
source share
4 answers

So, you say you want to replace any of the 600 lines in each of the 150,000 lines, and you want to start one replace operation per line?

Yes, there is a way to do this, but not in PowerShell, at least I can't think of one. This can be done in Perl.


Method:

  • Create a hash where keys are something and values ​​are something.
  • Connect hash keys with symbol | and use it as a regex matching group.
  • In the replacement, interpolate an expression that extracts the value from the hash using the match variable for the capture group

Problem:

Disappointingly, PowerShell does not expose matching variables outside of the regex replacement call. It does not work with the -replace operator, and it does not work with [regex] :: replace .

In Perl, you can do this, for example:

 $string =~ s/(1|2|3)/@{[$1 + 5]}/g; 

This will add 5 to the numbers 1, 2, and 3 throughout the line, so if the line is "1224526123 [2] [6]", it turns into "6774576678 [7] [6]".

However, in PowerShell, both of them do not work:

 $string -replace '(1|2|3)',"$($1 + 5)" [regex]::replace($string,'(1|2|3)',"$($1 + 5)") 

In both cases, $ 1 evaluates to null, and the expression evaluates to old 5. Matching variables in substitutions make sense only in the resulting string, that is, in one quoted string or any other double-quoted string. They are basically just backlinks that look like matching variables. Of course, you can quote $ in front of the number in the line with two quotes, so it will evaluate the corresponding group of matches, but this will defeat the goal - it cannot participate in the expression.


Decision:

[This answer has been modified from the original. It has been formatted to match strings with regular expression metacharacters. And your TV screen, of course.]

If another language suits you, the following Perl script works like a charm:

 $filePath = $ARGV[0]; # Or hard-code it or whatever open INPUT, "< $filePath"; open OUTPUT, '> C:\log.txt'; %replacements = ( 'something0' => 'somethingelse0', 'something1' => 'somethingelse1', 'something2' => 'somethingelse2', 'something3' => 'somethingelse3', 'something4' => 'somethingelse4', 'something5' => 'somethingelse5', 'X:\Group_14\DACU' => '\\DACU$', '.*[^xyz]' => 'oO{xyz}', 'moresomethings' => 'moresomethingelses' ); foreach (keys %replacements) { push @strings, qr/\Q$_\E/; $replacements{$_} =~ s/\\/\\\\/g; } $pattern = join '|', @strings; while (<INPUT>) { s/($pattern)/$replacements{$1}/g; print OUTPUT; } close INPUT; close OUTPUT; 

It searches for hash keys (to the left of => ) and replaces them with the corresponding values. Here's what happens:

  • The foreach loop goes through all the hash elements and creates an array named @strings that contains the % hash substitution keys, with the metacharacters quoted using \ Q and \ E , and the quoting result to use as the regular expression pattern ( qr = quote regex) . In the same passage, he avoids all backslashes in the replacement strings, doubling them.
  • Next, the elements of the array are connected to | to form a search template. You can include brackets for grouping in $ pattern if you want, but I think this makes it clearer what is happening.
  • The while loop reads each line from the input file, replaces any line in the search pattern with the corresponding replacement lines in the hash, and writes the line to the output file.

By the way, you might notice several other modifications to the original script. My Perl collected dust during my recent PowerShell hit, and at a second glance I noticed a few things that could be done better.

  • while (<INPUT>) reads the file one line at a time. Much wiser than reading just 150,000 lines into an array, especially when your goal is efficiency.
  • I simplified @{[$replacements{$1}]} to $replacements{$1} . Perl does not have a built-in way to interpolate expressions such as PowerShell $ () , so @ {[]} is used as a workaround - it creates a literal array of one element containing the expression. But I realized that this is not necessary if the expression is just one scalar variable (I had it there as a hold from my initial testing, where I applied the calculations to the matching variable $ 1 ).
  • Close statements are not strictly necessary, but he believes that good practice explicitly closes your file descriptors.
  • I changed the abbreviation for to foreach to make it more understandable and more familiar to PowerShell programmers.
+4
source share

Combining the hash technique from Adi Inbar's answer and the match appraiser from Keith Hill 's answer to another recent question, here's how you can perform a replacement in PowerShell:

 # Build hashtable of search and replace values. $replacements = @{ 'something0' = 'somethingelse0' 'something1' = 'somethingelse1' 'something2' = 'somethingelse2' 'something3' = 'somethingelse3' 'something4' = 'somethingelse4' 'something5' = 'somethingelse5' 'X:\Group_14\DACU' = '\\DACU$' '.*[^xyz]' = 'oO{xyz}' 'moresomethings' = 'moresomethingelses' } # Join all (escaped) keys from the hashtable into one regular expression. [regex]$r = @($replacements.Keys | foreach { [regex]::Escape( $_ ) }) -join '|' [scriptblock]$matchEval = { param( [Text.RegularExpressions.Match]$matchInfo ) # Return replacement value for each matched value. $matchedValue = $matchInfo.Groups[0].Value $replacements[$matchedValue] } # Perform replace over every line in the file and append to log. Get-Content $filePath | foreach { $r.Replace( $_, $matchEval ) } | Add-Content 'C:\log.txt' 
+6
source share

I also don't know how to solve this in powershell, but I know how to solve it in Bash, and this is with the sed tool. Fortunately, there is also Sed for Windows . If all you want to do is replace "something #" with "somethingelse #" everywhere, then this command will do the trick for you

 sed -i "s/something([0-9]+)/somethingelse\1/g" c:\log.txt 

In Bash, you really need to avoid pairs of these backslashes, but I'm not sure what you need on Windows. If the first team complains, you can try

 sed -i "s/something\([0-9]\+\)/somethingelse\1/g" c:\log.txt 
+2
source share

I would use the switchhell switch statement:

 $string = gc $filePath $string | % { switch -regex ($_) { 'something0' { 'somethingelse0' } 'something1' { 'somethingelse1' } 'something2' { 'somethingelse2' } 'something3' { 'somethingelse3' } 'something4' { 'somethingelse4' } 'something5' { 'somethingelse5' } 'pattern(?<a>\d+)' { $matches['a'] } # sample of more complex logic ... (600 More Lines...) ... default { $_ } } } | ac "C:\log.txt" 
+1
source share

All Articles