How to compare and replace strings on different lines in unix

I want to compare and replace the lines represented on different lines in unix

For example, I have a file with two words in each line

<a> <b>
<d> <e>
<b> <c>
<c> <e>

If the second word of any line matches the first word of any other line, the second word of this line should be replaced by the second word of the matching line, and it should be repeated until there is no match between the second word of the line with the first word of the other line

I need a result like

<a> <e>
<b> <e>
<c> <e>
<d> <e>

I am new to unix and don't understand how to implement this. Can someone give suggestions or explain how we can do this?

0
source share
3 answers

:

$ cat tst.awk
function descend(node) {return (map[node] in map ? descend(map[node]) : map[node])}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

$ awk -f tst.awk file
<a> <e>
<b> <e>
<c> <e>
<d> <e>

, , , , node "*", , :

$ cat tst.awk
function descend(node,  child, descendant) {
    stack[node]
    child = map[node]
    if (child in map) {
        if (child in stack) {
            descendant = node "*"
        }
        else {
            descendant = descend(child)
        }
    }
    else {
        descendant = child
    }
    delete stack[node]
    return descendant
}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

.

$ cat file
<w> <w>
<x> <y>
<y> <z>
<z> <x>
<a> <b>
<d> <e>
<b> <c>
<c> <e>

$ awk -f tst.awk file
<w> <w>*
<x> <z>*
<y> <x>*
<z> <y>*
<a> <e>
<b> <e>
<c> <e>
<d> <e>

, / , 2 script :

{ keys[++numKeys] = $1; map[$1] = $2 }
END {
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        print key, descend(key)
    }
}
+3

:

#!/usr/bin/perl
use warnings;
use strict;

my (@buff);
sub output {
    my $last = pop @buff;
    print map "$_ $last\n", @buff;
    @buff = ();
}

while (<>) {
    my @F = split;
    output() if @buff and $F[0] ne $buff[-1]; # End of a group.
    push @buff, $F[0] unless @buff;           # Start a new group.
    push @buff, $F[1];
}

output();                                     # Don't forget to print the last buffer.

: . . , .

+2
awk '{i++;a[i]=$1;b[i]=$2;next}
      END{
            for(i=1;i in a;i++)
            {
              f=1;
              while (f==1)
              {
                f=0;
                for(j=i+1;j in a;j++)
                {
                  if(b[i]==a[j])
                  {
                    b[i]=b[j];
                    f=1;
                  }
                }
              }
            }
            for(i=1;i in a;i++)
            {
              print a[i],b[i];
            }
          }' input.txt

Input:

<a> <b>
<d> <e>
<b> <c>
<c> <e>

:

<a> <e>
<d> <e>
<b> <e>
<c> <e>

Input:

<a> <b>
<e> <z>
<b> <e>

:

<a> <z>
<e> <z>
<b> <e>


If you need to get

<a> <z>
<e> <z>
<b> <z>

As the output of the second input, you can change this line:

if(b[i]==a[j])

at

if(j!=i&&b[i]==a[j])

and this:

for(j=i+1;j in a;j++)

at

for(j=1;j in a;j++)

Also note that this code assumes that there is no case where the second word of a line is equal to both the first word of a line or its second word ie:

<a> <b>
<e> <z>
<b> <b>

In this case, the code will never end.

0
source

All Articles