Extract regular expression capture group matches from file

Question

Extract regular expression capture group matches from file

I want to execute a name called action on the linux command line (several ca bash script will also be executed). the command i tried:

sed 's/href="([^"])"/$1/g' page.html > list.lst

but obviously it failed.

To be precise, here is my input:

 <link rel="stylesheet" type="text/css" href="style/css/colors.css" /> <link rel="stylesheet" type="text/css" href="style/css/global.css" /> <link rel="stylesheet" type="text/css" href="style/css/icons.css" />

the output I want will be a comma separated or space separated list of all matches in the input file:

 style/css/colors.css,style/css/global.css,style/css/icons.css

I think I got the correct expression: href = "([^"] *) "

but I don’t know how to do it. sed would do a search / replace, which is not exactly what I want (on the contrary, I only need to maintain matches and discard the rest, not replace them).

+7

command-line linux regex

BiAiB Jul 26 '11 at 2:36 p.m.

source share

1 answer

rid · Accepted Answer · 2011-07-26T14:38:58+0000

 grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'

This will extract all the lines containing href and get only the first href in each line. Also see this post on parsing HTML with regular expressions.

Extract regular expression capture group matches from file

More articles: