I want to extract information from a large file based on several conditions (from the same file), as well as to search for patterns from another small file. Below is the script I used:
awk 'BEGIN{FS=OFS="\t"}NR==FNR{a[$0]++;next}$1 in a {print $2,$4,$5}' file2.txt file1.txt >output.txt
Now I want to use the condition in the same awk script, which ONLY prints a line in which the element of the 4th column (any character from ATGC) corresponds to the element of the fifth column (any character among ATGC); both columns are in file 1.
Therefore, in a sense, I want to combine the following script with the above script:
awk '$4 " "==$5{print $2,$4,$5}' file1.txt
The following is a representation of file1.txt:
SNP Name Sample ID GC Score Allele1 - Forward Allele2 - Forward
ARS-BFGL-BAC-10172 834269752 0.9374 A G
ARS-BFGL-BAC-1020 834269752 0.9568 A A
ARS-BFGL-BAC-10245 834269752 0.7996 C C
ARS-BFGL-BAC-10345 834269752 0.9604 A C
ARS-BFGL-BAC-10365 834269752 0.5296 G G
ARS-BFGL-BAC-10591 834269752 0.4384 A A
ARS-BFGL-BAC-10793 834269752 0.9549 C C
ARS-BFGL-BAC-10867 834269752 0.9400 G G
ARS-BFGL-BAC-10951 834269752 0.5453 T T
enter code here
Below is a view of file2.txt
ARS-BFGL-BAC-10172
ARS-BFGL-BAC-1020
ARS-BFGL-BAC-10245
ARS-BFGL-BAC-10345
ARS-BFGL-BAC-10365
ARS-BFGL-BAC-10591
ARS-BFGL-BAC-10793
ARS-BFGL-BAC-10867
ARS-BFGL-BAC-10951
The conclusion should be:
834269752 A A
834269752 C C
834269752 G G
834269752 A A
834269752 C C
834269752 G G
834269752 T T