How to compare and replace strings on different lines in unix

Question

How to compare and replace strings on different lines in unix

I want to compare and replace the lines represented on different lines in unix

For example, I have a file with two words in each line

<a> <b>
<d> <e>
<b> <c>
<c> <e>

If the second word of any line matches the first word of any other line, the second word of this line should be replaced by the second word of the matching line, and it should be repeated until there is no match between the second word of the line with the first word of the other line

I need a result like

<a> <e>
<b> <e>
<c> <e>
<d> <e>

I am new to unix and don't understand how to implement this. Can someone give suggestions or explain how we can do this?

0

unix regex awk sed

shalini Aug 1 '14 at 14:47

source share

3 answers

:

#!/usr/bin/perl
use warnings;
use strict;

my (@buff);
sub output {
    my $last = pop @buff;
    print map "$_ $last\n", @buff;
    @buff = ();
}

while (<>) {
    my @F = split;
    output() if @buff and $F[0] ne $buff[-1]; # End of a group.
    push @buff, $F[0] unless @buff;           # Start a new group.
    push @buff, $F[1];
}

output();                                     # Don't forget to print the last buffer.

: . . , .

+2

choroba 01 . '14 15:15

awk '{i++;a[i]=$1;b[i]=$2;next}
      END{
            for(i=1;i in a;i++)
            {
              f=1;
              while (f==1)
              {
                f=0;
                for(j=i+1;j in a;j++)
                {
                  if(b[i]==a[j])
                  {
                    b[i]=b[j];
                    f=1;
                  }
                }
              }
            }
            for(i=1;i in a;i++)
            {
              print a[i],b[i];
            }
          }' input.txt

Input:

<a> <b>
<d> <e>
<b> <c>
<c> <e>

:

<a> <e>
<d> <e>
<b> <e>
<c> <e>

Input:

<a> <b>
<e> <z>
<b> <e>

:

<a> <z>
<e> <z>
<b> <e>

If you need to get

<a> <z>
<e> <z>
<b> <z>

As the output of the second input, you can change this line:

if(b[i]==a[j])

at

if(j!=i&&b[i]==a[j])

and this:

for(j=i+1;j in a;j++)

at

for(j=1;j in a;j++)

Also note that this code assumes that there is no case where the second word of a line is equal to both the first word of a line or its second word ie:

<a> <b>
<e> <z>
<b> <b>

In this case, the code will never end.

0

Ashkan Aug 1 '14 at 16:24

source share

Ed Morton · Accepted Answer · 2014-08-01T17:01:16+0000

:

$ cat tst.awk
function descend(node) {return (map[node] in map ? descend(map[node]) : map[node])}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

$ awk -f tst.awk file
<a> <e>
<b> <e>
<c> <e>
<d> <e>

, , , , node "*", , :

$ cat tst.awk
function descend(node,  child, descendant) {
    stack[node]
    child = map[node]
    if (child in map) {
        if (child in stack) {
            descendant = node "*"
        }
        else {
            descendant = descend(child)
        }
    }
    else {
        descendant = child
    }
    delete stack[node]
    return descendant
}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

.

$ cat file
<w> <w>
<x> <y>
<y> <z>
<z> <x>
<a> <b>
<d> <e>
<b> <c>
<c> <e>

$ awk -f tst.awk file
<w> <w>*
<x> <z>*
<y> <x>*
<z> <y>*
<a> <e>
<b> <e>
<c> <e>
<d> <e>

, / , 2 script :

{ keys[++numKeys] = $1; map[$1] = $2 }
END {
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        print key, descend(key)
    }
}

How to compare and replace strings on different lines in unix

More articles: