Find string between html tags in Powershell

I am trying to write a Powershell script that will pull a line between two HTML tags in an HTML file. I don’t know what value will be, but I know what tags to look for. In addition, I know that tags do not always appear at the beginning of a line (i.e., they can be in the middle of a line of text). Finally, I also know that tags and the string between them will never be split into a string.

I have a path to a file stored in a variable

$filePath = "C:\Path\file.html" 

I am trying to find any value between <h6> and </h6> and store these values ​​in an array.

+4
source share
1 answer

Try

 $myarray = gc $filepath | % { [regex]::matches( $_ , '(?<=<h6>\s+)(.*?)(?=\s+</h6>)' ) } | select -expa value 

This removes the leading and trailing spaces, if any. If you also need these spaces, remove \s+ from the regex pattern

+1
source

All Articles