PowerShell multi-line Regex

I have this PowerShell script, the main purpose of which is to search through HTML files in a folder, search for specific HTML markup and replace it with what I tell him.

I was able to make 3/4 of my find and replaced perfectly. I am having problems with regex.

This is the markup in which I am trying to find and replace my regex:

<a href="programsactivities_skating.html"><br /> </a> 

Here is the regex that I still have, along with the function that I use in it:

 automate -school "C:\Users\$env:username\Desktop\schools\$question" -query '(?mis)(?!exclude1|exclude2|exclude3)(<a[^>]*?>(\s|&nbsp;|<br\s?/?>)*</a>)' -replace '' 

And here is the automation function:

 function automate($school, $query, $replace) { $processFiles = Get-ChildItem -Exclude *.bak -Include "*.html", "*.HTML", "*.htm", "*.HTM" -Recurse -Path $school foreach ($file in $processFiles) { $text = Get-Content $file $text = $text -replace $query, $replace $text | Out-File $file -Force -Encoding utf8 } } 

I have been trying to find a solution for this for about 2 days, and just can't get it to work. I decided that the problem is that I need to tell my regular expression about the Multiline account and what I'm having problems with.

Any help anyone can provide is greatly appreciated.

Thanks at Advance.

+7
regex powershell
source share
3 answers

Get-Content creates an array of lines, where each line contains one line from your input file, so you cannot match text fragments that span more than one line. You need to combine the array into one line if you want to be able to match multiple lines:

 $text = Get-Content $file | Out-String 

or

 [String]$text = Get-Content $file 

or

 $text = [IO.File]::ReadAllText($file) 

Note that the 1 st and 2 nd methods do not save line breaks from the input file. Method 2 simply distorts all line breaks, as Keith pointed out in the comments, and method 1 puts <CR><LF> at the end of each line when combining an array. The latter can be a problem when working with Linux / Unix or Mac files.

+18
source share

I don’t understand what you are trying to do with these Exclude elements, but I find that a multi-line regular expression is usually easier to build on this line:

 $text = @' <a href="programsactivities_skating.html"><br /> </a> '@ $regex = @' (?mis)<a href="programsactivities_skating.html"><br /> \s+?</a> '@ $text -match $regex True 
+1
source share

Get-Content will return an array of strings, you want to combine the corresponding strings to create:

 function automate($school, $query, $replace) { $processFiles = Get-ChildItem -Exclude *.bak -Include "*.html", "*.HTML", "*.htm", "*.HTM" -Recurse -Path $school foreach ($file in $processFiles) { $text = "" $text = Get-Content $file | % { $text += $_ +"`r`n" } $text = $text -replace $query, $replace $text | Out-File $file -Force -Encoding utf8 } } 
-one
source share

All Articles