Using awk to pull specific lines from a file

Question

Using awk to pull specific lines from a file

I have two files, one file is my data, and the other is a list of line numbers that I want to extract from my data file. Can I use awk to read lines in my file and then extract lines matching line numbers?

Example: Data File:

This is the first line of my data This is the second line of my data This is the third line of my data This is the fourth line of my data This is the fifth line of my data

File with line numbers

 1 4 5

Output:

 This is the first line of my data This is the fourth line of my data This is the fifth line of my data

I have ever used awk and sed command line for very simple things. This is much more than me, and I can’t answer within an hour.

+8

awk line

Davy kavanagh Aug 29 '12 at 16:55

source share

6 answers

 awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile

just referencing the index of the array creates an entry. Looping over the first file, and NR (record number) is equal to FNR (file record number) using the next operator stores all line numbers in the array. After that, when the FNR second file is present in the array (true), the line is printed (this is the default action for "true").

+10

Dennis williamson Aug 29 '12 at 18:59

source share

Here is an awk example. the input file is loaded in front, then the corresponding data file entries are output.

 awk \ -v RS="[\r]*[\n]" \ -v FILE="inputfile" \ 'BEGIN \ { LINES = "," while ((getline Line < FILE)) { LINES = LINES Line "," } } LINES ~ "," NR "," \ { print } ' datafile

+1

kbulgrien Aug 29 '12 at 17:20

source share

I had the same problem. This solution has already been published by Thor:

 cat datafile \ | awk 'BEGIN{getline n<"numbers"} n==NR{print; getline n<"numbers"}'

If I do not have a file with numbers, but it is instead transferred from stdin, and you do not want to generate a temporary number file, this is an alternative solution:

 cat numbers \ | awk '{while((getline line<"datafile")>0) {n++; if(n==$0) {print line;next}}}'

+1

tommy.carstensen Jul 05 '14 at 17:31

source share

while reading a line; do echo $ (sed -n '$ (echo $ line) p' Datafile.txt); done <numbersfile.txt

0

Testbud Jun 12 '14 at 12:00

source share

This decision ...

awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile

... prints only unique numbers in a numeric file. What if the number file contains duplicate entries? Then sed is a better (but much slower) alternative:

sed -nf <(sed 's/.*/&p/' numberfile) datafile

0

dce Apr 3 '19 at 16:00

source share

Thor · Accepted Answer · 2012-08-29T17:06:44+0000

One way: sed :

 sed 's/$/p/' linesfile | sed -n -f - datafile

You can use the same trick with awk :

 sed 's/^/NR==/' linesfile | awk -f - datafile

Edit - Huge Alternative Files

As for the huge number of lines, it is impractical to store entire files in memory. The solution in this case could be sorting the numbers of files and reading one line at a time. GNU awk tested the following:

extract.awk

 BEGIN { getline n < linesfile if(length(ERRNO)) { print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr" exit } } NR == n { print if(!(getline n < linesfile)) { if(length(ERRNO)) print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr" exit } }

Run it as follows:

 awk -v linesfile=$linesfile -f extract.awk infile

Testing:

 echo "2 4 7 8 10 13" | awk -v linesfile=/dev/stdin -f extract.awk <(paste <(seq 50e3) <(seq 50e3 | tac))

Output:

 2 49999 4 49997 7 49994 8 49993 10 49991 13 49988

Using awk to pull specific lines from a file

Edit - Huge Alternative Files

More articles: