Why space forces a remembered template in sed to output different things

I am trying to get the value of a value in this xml line through a terminal, so I am using sed.

abcs='<param name="abc" value="bob3" no_but_why="4"/>' echo $abcs | sed -e 's/.*value="\(.*\)" .*/\1/' echo $abcs | sed -e 's/.*value="\(.*\)".*/\1/' 

Output:

 bob3 bob3" no_but_why="4 

Why is the second path without space more than what I wanted to print? Why is this \ 1 will be affected by this

+6
source share
1 answer

As you can see, the difference is the use of the greedy pattern .* In the second regular expression after " without space."

The reason why it behaves differently is because after no_but_why= there is a double quote, and .* Is a greedy pattern that matches up to the last " before /> in the second regular expression.

In your first regular expression, "\(.*\)" "bob3" only "bob3" , because after that there is a space due to which the regex mechanism prevents .* From matching up to the last double quote in the input.

To avoid this situation , you should use a negative character class instead of greedy matching.

Consider these examples of sed commands:

 sed -e 's/.*value="\([^"]*\)" .*/\1/' <<< "$abcs" bob3 sed -e 's/.*value="\([^"]*\)".*/\1/' <<< "$abcs" bob3 

Now you can see that both commands produce the same output of bob3 , because the negative character class [^"]* will match until it becomes the next " not until the very last " case with .* .

+8
source

All Articles