Pipeline output in grep

Question

Pipeline output in grep

Just a little disclaimer, I am not very good at programming, so please excuse me if I use any conditions incorrectly / in a confusing way.

I want to be able to extract certain information from a web page and am trying to do this by passing the output of the curl function to grep. Oh, and this is in cygwin, if that matters.

When you print only

$ curl www.ncbi.nlm.nih.gov/gene/823951

The terminal prints the entire web page in what I consider html. From here, I thought that I could just pass this output to the grep function with any search query:

  $ curl www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene Symbol"

But instead of printing the web page in general, the terminal gives me:

  % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 142k 0 142k 0 0 41857 0 --:--:-- 0:00:03 --:--:-- 42083

Can someone explain why he is doing this / how can I search for specific lines of text on a web page? Ultimately, I want to compile information into the database, such as names, types and descriptions of genes, so I was hoping to subsequently export the results from the grep function to a text file.

Any help is greatly appreciated, thanks in advance!

+7

bash grep search curl cygwin

David xie Apr 6 '16 at 17:23

source share

1 answer

retrospectacus · Answer 1 · 2016-04-06T17:37:14+0000

Curl determines that it does not output to the terminal, and shows a progress indicator. You can suppress the progress bar with -s.

HTML data is indeed sent to grep. However, this page does not contain the text "Gene Symbol". Grep is case sensitive (if not used with -i) and you are looking for "Gene symbol".

 $ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene symbol" <dt class="noline"> Gene symbol </dt>

You may also need the following HTML line, which you can make grep output with the -A option:

 $ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep -A1 "Gene symbol" <dt class="noline"> Gene symbol </dt> <dd class="noline">AT3G47960</dd>

For more information about these and other options, see man curl and man grep .

Pipeline output in grep

More articles: