Sed: replace nth word with matching pattern?

Question

Sed: replace nth word with matching pattern?

I have a text file with the following characteristics:

each line has at least three words separated by a space
"word" can be any character or string of characters

I added some notes to some of the preliminary suggestion lines for making changes to the original words, and now I would like to use sed to make these changes for me. So, to get a sharper image, my file looks like this:

NO NO O
SIGNS NN O      #NNS
GIVEN VBD B-VP  #VBN
AT IN O
THIS NN O       
TIME NN O            ## B-NP
. PER O
...

Notes with 1 # replace the word SECOND in the string, and notes with 2 # replace the word THIRD in the string. Can anyone suggest a way to do this with sed (or awk or something else)? Again, to clarify (hopefully), my goal is to get the pattern following # or ##, and replace the nth word of the string with a matching pattern.

Thank.

+5

bash regex awk perl sed

wayeast Feb 16 '12 at 1:23

source share

3 answers

:

sed 's/\S*\(\s*\S*\s*#\s*\)\([^#]*\)$/\2\1/;s/ *##*.*/\t\t#/' file
NO NO O
SIGNS NNS O             #
GIVEN VBN B-VP          #
AT IN O
THIS NN O       
TIME NN B-NP            #
. PER O
...

+1

potong 16 . '12 4:03

Perl . , script.

:

perl -lnwe 's/#\K\s+//; my @a=/\S+/g; if (@a>3) { $c = $a[3] =~ tr/#//d; $a[$c] = $a[3]; } print join " ", @a[0..2]' file

stdout . -i.bak, . perl -i.bak -lnwe '....' , file.bak.

:

$ perl -lnwe '       # -l: handle newlines, -n read file/stdin
    s/#\K\s+//;                    # strip optional spaces
    my @a = /\S+/g;                # extract the data
    if (@a > 3) {                  # when there are replacements..
        my $c = $a[3] =~ tr/#//d;  # count and remove #
        $a[$c] = $a[3];            # set element number $c to element 3
    } print join " ", @a[0..2]     # reassemble and print 3 first elements
' file

:

NO NO O
SIGNS NNS O
GIVEN VBN B-VP
AT IN O
THIS NN O
TIME NN B-NP
. PER O

0

TLP 16 . '12 4:30

SiegeX · Accepted Answer · 2012-02-16T02:17:30+0000

This will work for you:

awk '/#/{sub(/# +/,"#");n=gsub(/#/,"",$NF);$(n+1)=$NF;$NF="\t\t#"}1' file

Explanation

/#/{ ... }: find the lines containing #, and follow these steps ...
sub(/# +/,"#"): if necessary, remove all spaces between notes and #
n=gsub(/#/,"",$NF): delete everything #from the last field $NFand set the number #removed to a variablen
$(n+1)=$NF: set field n + 1 $(n+1)to the new last field $NF, in which everything is #deleted
$NF="\t\t#": $NF , #
1: awk
file:

$ awk '/#/{sub(/# +/,"#");n=gsub(/#/,"",$NF);$(n+1)=$NF;$NF="\t\t#"}1' file
NO NO O
SIGNS NNS O             #
GIVEN VBN B-VP          #
AT IN O
THIS NN O
TIME NN B-NP            #
. PER O
...

. , # , sub(/# +/,"#"); ,

Sed: replace nth word with matching pattern?

Explanation

More articles: