I have a file that looks like this.
10gs+VWW+A+210 10gs-ASN-A-206 0.616667 0.094872 10gs+VWW+A+210 10gs-GLU-A-31- 0.363077 0.151282 10gs+VWW+A+210 10gs-GLY-A-207 0.602564 0.060256 10gs+VWW+A+210 10gs-LEU-A-132 0.378151 0.288462 10gs+VWW+A+210 10gs-LEU-A-60- 0.376812 0.133333 10gs+VWW+A+210 11ba-GLU-A-2-z 0.333333 0.065385 10gs+VWW+A+210 11ba-SER-A-15- 0.400000 0.053846 10gs+VWW+A+210 11ba-GLU-A-2-z 0.333333 0.065385 10gs+VWW+A+210 11ba-SER-A-15- 0.400000 0.053846 17gs+VWW+A+210 11ba-SER-A-77- 0.415789 0.101282 15gs+VWW+A+210 11ba-VAL-A-47- 0.413793 0.215385
I want to align lines matching a pattern [including spaces in it]. Let's say the pattern is: '10gs + VWW + A + 210 11ba -'
When I give a pattern like the grep argument, I get the correct lines correctly. However, the problem arises when I want to map several patterns like these from a file, for example pattern.txt
, which has a list of all of these patterns on each line.
pattern.txt
as follows:
10gs + VWW + A + 210 11ba -
10gs + VWW + A + 210 10gs -
When I use the shell script as follows:
for i in `cat pattern.txt`; do grep -e "^$i" bigfile.txt ; done
The team takes 10gs+VWW+A+210
separately and 11ba separately for compliance. I want to match the whole thing (separated by a space), i.e. 10gs + VWW + A + 210 11ba for matching, not two lines separately.
How do I modify an existing shell script to break the space character in the search bar?
Also, since the file with which I am matching this rowset is large, ~ 50 GB. Thus, an effective memory solution is welcome. Thanks.
source share