Extract regular expression capture group matches from file

I want to execute a name called action on the linux command line (several ca bash script will also be executed). the command i tried:

sed 's/href="([^"])"/$1/g' page.html > list.lst 

but obviously it failed.

To be precise, here is my input:

 <link rel="stylesheet" type="text/css" href="style/css/colors.css" /> <link rel="stylesheet" type="text/css" href="style/css/global.css" /> <link rel="stylesheet" type="text/css" href="style/css/icons.css" /> 

the output I want will be a comma separated or space separated list of all matches in the input file:

 style/css/colors.css,style/css/global.css,style/css/icons.css 

I think I got the correct expression: href = "([^"] *) "

but I don’t know how to do it. sed would do a search / replace, which is not exactly what I want (on the contrary, I only need to maintain matches and discard the rest, not replace them).

+7
source share
1 answer
 grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g' 

This will extract all the lines containing href and get only the first href in each line. Also see this post on parsing HTML with regular expressions.

+7
source

All Articles