Delete duplicate words in string with sed

Purely academic, but it upsets me.

I want to fix this text:

there there are are multiple lexical errors in this line line 

using sed. I'm so far away:

 sed 's/\([az][az]*[ ,\n][ ,\n]*\)\1/\1/g' < file.text 

He corrects everything except the final double words!

 there are multiple lexical errors in this line line 

Can a sadovskiy guru explain why this does not apply to words at the end?

+7
source share
1 answer

This is because in the latter case ( line ), your regular memory 1 will contain line (a line followed by a space), and you are looking for it to repeat. Since there is no space after the last line no match is made.

To fix this, add a space after the end of the word line .

Alternatively, you can change the regex to:

 sed -e 's/\b\([az]\+\)[ ,\n]\1/\1/g' 

Take a look

+10
source

All Articles