Grepping using "|" alternative operator

The following is an example of a large file named AT5G60410.gff:

Chr5 TAIR10 gene 24294890 24301147 . + . ID=AT5G60410;Note=protein_coding_gene;Name=AT5G60410 Chr5 TAIR10 mRNA 24294890 24301147 . + . ID=AT5G60410.1;Parent=AT5G60410;Name=AT5G60410.1;Index=1 Chr5 TAIR10 protein 24295226 24300671 . + . ID=AT5G60410.1-Protein;Name=AT5G60410.1;Derives_from=AT5G60410.1 Chr5 TAIR10 exon 24294890 24295035 . + . Parent=AT5G60410.1 Chr5 TAIR10 five_prime_UTR 24294890 24295035 . + . Parent=AT5G60410.1 Chr5 TAIR10 exon 24295134 24295249 . + . Parent=AT5G60410.1 Chr5 TAIR10 five_prime_UTR 24295134 24295225 . + . Parent=AT5G60410.1 Chr5 TAIR10 CDS 24295226 24295249 . + 0 Parent=AT5G60410.1,AT5G60410.1-Protein; Chr5 TAIR10 exon 24295518 24295598 . + . Parent=AT5G60410.1 

I'm having trouble extracting specific lines from this using grep. I wanted to extract all rows that are of the type "gene", or the type of "exon" specified in the third column. I was surprised when this did not work:

 grep 'gene|exon' AT5G60410.gff 

Results are not returned. Where am I wrong?

+50
linux regex grep
Jul 21 '11 at 12:18
source share
5 answers

You need to avoid | . The following should do the job.

 grep "gene\|exon" AT5G60410.gff 
+84
Jul 21 '11 at 12:21
source share

By default, grep treats typical special characters as regular characters if they are not escaped. Therefore, you can use the following:

 grep 'gene\|exon' AT5G60410.gff 

However, you can change its mode using the following forms to accomplish what you expect:

 egrep 'gene|exon' AT5G60410.gff grep -E 'gene|exon' AT5G60410.gff 
+33
Jul 21 '11 at 12:22
source share

This is another grepping method for several options:

 grep -e gene -e exon AT5G60410.gff 

The -e switch indicates different patterns to match.

+17
Jul 21 '11 at 12:23
source share

This will work:

 grep "gene\|exon" AT5G60410.gff 
0
Jul 21 '11 at 12:23
source share

I found this question while searching for a specific problem with which I linked the command using the <command , which used the interleaving operator in a regular expression, so I thought I would make my more specialized answer.

The error I encountered turned out to be with the previous pipe operator (ie | ), and not with the rotation parameter (ie | identical to the pipe operator) in the general grep expression. The answer for me was to escape correctly and specify special shell characters, such as & , if necessary, before assuming that the problem is with my medium regular expression grep interleave operator.

For example, the command that I ran on my local machine was:

 get http://localhost/foobar-& | grep "fizz\|buzz" 

This command resulted in the following error:

 -bash: syntax error near unexpected token `|' 

This error was fixed by changing my command to:

 get "http://localhost/foobar-&" | grep "fizz\|buzz" 

By holding the & character with double quotes, I was able to solve the problem. The answer had nothing to do with the rotation operation.

0
Feb 08 '17 at 0:33
source share



All Articles