Remove What Follows Nth Case Using Single Lines

I would like to remove what follows the fourth occurrence of the ":" symbol in any field. See an example:

Input:

1 10975     A C    1/1:137,105:245:99:1007,102,0   0/1:219,27:248:20:222,0,20 
1 19938     T TA   ./.                             1/1:0,167:167:99:4432,422,0,12,12
12 20043112 C G    1/2:3,5,0:15:92                 2/2:3,15:20:8

Expected Result:

1 10975     A C    1/1:137,105:245:99   0/1:219,27:248:20 
1 19938     T TA   ./.                  1/1:0,167:167:99
12 20043112 C G    1/2:3,5,0:15:92      2/2:3,15:20:8

Basically, any field that has a “:”, that follows its fourth occurrence, should be deleted. Note that the third line does not change anything, because ":" appears only three times. I tried and found a solution (not very good) that didn’t work only for the first line, and not for secod, because it has more commas ","

Incomplete solution:

sed 's/:[0-9]*,[0-9]*,[0-9]*//g'

Thank you in advance

+4
source share
4 answers

5 :[^:]+

< file.txt awk '{ for (i=5; i<=NF; i++) $i = gensub(/:[^:]+/, "", 4, $i) }1' | column -t

5 :

< file awk '{ for (i=5; i<=NF; i++) $i = gensub(/((:[^:]+){3}).*/, "\\1", 1, $i) }1' | column -t

:

, , . , , , . gensub() , . , 4 gensub() . , . , gensub() , sub() gsub(). , , . gensub() - , GNU awk. . .

:

1   10975     A  C   1/1:137,105:245:99  0/1:219,27:248:20
1   19938     T  TA  ./.                 1/1:0,167:167:99
12  20043112  C  G   1/2:3,5,0:15:92     2/2:3,15:20:8
+2

Sed:

sed -r 's/((:[^: \t]*){3}):[^ \t]*/\1/g' file | column -t

Perl:

perl -pe 's/((:\S*){3}):\S*/$1/g' file | column -t
+5

sed

sed -r 's/((:[^ ]*){3}):[^ ]*/\1/g' file

:

1 10975     A C    1/1:137,105:245:99   0/1:219,27:248:20 
1 19938     T TA   ./.                             1/1:0,167:167:99
12 20043112 C G    1/2:3,5,0:15:92                 2/2:3,15:20:8

perl

perl -pe 's/((:\S*){3}):\S*/$1/g' file
+3
perl -lane 's/(.*?:.*?:.*?:.*?):.*/$1/g  for @F ; printf "@F"."\n"' your_file
0

All Articles