Using awk to pull specific lines from a file

I have two files, one file is my data, and the other is a list of line numbers that I want to extract from my data file. Can I use awk to read lines in my file and then extract lines matching line numbers?

Example: Data File:

This is the first line of my data This is the second line of my data This is the third line of my data This is the fourth line of my data This is the fifth line of my data 

File with line numbers

 1 4 5 

Output:

 This is the first line of my data This is the fourth line of my data This is the fifth line of my data 

I have ever used awk and sed command line for very simple things. This is much more than me, and I can’t answer within an hour.

+8
source share
6 answers

One way: sed :

 sed 's/$/p/' linesfile | sed -n -f - datafile 

You can use the same trick with awk :

 sed 's/^/NR==/' linesfile | awk -f - datafile 

Edit - Huge Alternative Files

As for the huge number of lines, it is impractical to store entire files in memory. The solution in this case could be sorting the numbers of files and reading one line at a time. GNU awk tested the following:

extract.awk

 BEGIN { getline n < linesfile if(length(ERRNO)) { print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr" exit } } NR == n { print if(!(getline n < linesfile)) { if(length(ERRNO)) print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr" exit } } 

Run it as follows:

 awk -v linesfile=$linesfile -f extract.awk infile 

Testing:

 echo "2 4 7 8 10 13" | awk -v linesfile=/dev/stdin -f extract.awk <(paste <(seq 50e3) <(seq 50e3 | tac)) 

Output:

 2 49999 4 49997 7 49994 8 49993 10 49991 13 49988 
+7
source
 awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile 

just referencing the index of the array creates an entry. Looping over the first file, and NR (record number) is equal to FNR (file record number) using the next operator stores all line numbers in the array. After that, when the FNR second file is present in the array (true), the line is printed (this is the default action for "true").

+10
source

Here is an awk example. the input file is loaded in front, then the corresponding data file entries are output.

 awk \ -v RS="[\r]*[\n]" \ -v FILE="inputfile" \ 'BEGIN \ { LINES = "," while ((getline Line < FILE)) { LINES = LINES Line "," } } LINES ~ "," NR "," \ { print } ' datafile 
+1
source

I had the same problem. This solution has already been published by Thor:

 cat datafile \ | awk 'BEGIN{getline n<"numbers"} n==NR{print; getline n<"numbers"}' 

If I do not have a file with numbers, but it is instead transferred from stdin, and you do not want to generate a temporary number file, this is an alternative solution:

 cat numbers \ | awk '{while((getline line<"datafile")>0) {n++; if(n==$0) {print line;next}}}' 
+1
source

while reading a line; do echo $ (sed -n '$ (echo $ line) p' Datafile.txt); done <numbersfile.txt

0
source

This decision ...

awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile

... prints only unique numbers in a numeric file. What if the number file contains duplicate entries? Then sed is a better (but much slower) alternative:

sed -nf <(sed 's/.*/&p/' numberfile) datafile

0
source

All Articles