Delete duplicate words in string with sed

Question

Delete duplicate words in string with sed

Purely academic, but it upsets me.

I want to fix this text:

there there are are multiple lexical errors in this line line

using sed. I'm so far away:

 sed 's/\([az][az]*[ ,\n][ ,\n]*\)\1/\1/g' < file.text

He corrects everything except the final double words!

 there are multiple lexical errors in this line line

Can a sadovskiy guru explain why this does not apply to words at the end?

+7

sed

benjwy May 15, '12 at 11:48

source share

1 answer

codaddict · Answer 1 · 2012-05-15T11:58:12+0000

This is because in the latter case ( line ), your regular memory 1 will contain line (a line followed by a space), and you are looking for it to repeat. Since there is no space after the last line no match is made.

To fix this, add a space after the end of the word line .

Alternatively, you can change the regex to:

 sed -e 's/\b\([az]\+\)[ ,\n]\1/\1/g'

Take a look

Delete duplicate words in string with sed

More articles: