I have file1 that has several lines (tens) and a much longer file2 (~ 500,000 lines). The lines in each file are not identical, although there are a subset of the fields that are identical. I want to take fields 3-5 from each line in file1 and look for file2 for the same template (only these three fields, in the same order - in file2, they fall into fields 2-4). If a match is found, I want to delete the corresponding line from file1.
For example, file1:
2016-01-06T05:38:31 2016-01-06T05:23:33 2016006 120E A TM Current 2016-01-06T07:34:01 2016-01-06T07:01:51 2016006 090E B TM Current 2016-01-06T07:40:44 2016-01-06T07:40:41 2016006 080E A TM Alt 2016-01-06T07:53:50 2016-01-06T07:52:14 2016006 090E A TM Current 2016-01-06T08:14:45 2016-01-06T08:06:33 2016006 080E C TM Current
file2:
2016-01-06T07:35:06.87 2016003 100E C NN Current 0 2016-01-06T07:35:09.97 2016003 100E B TM Current 6303 2016-01-06T07:36:23.12 2016004 030N C TM Current 0 2016-01-06T07:37:57.36 2016006 090E A TM Current 399 2016-01-06T07:40:29.61 2016006 010N C TM Current 0
... (and for 500,000 lines)
So, in this case, I want to delete the fourth line of file1 (in place).
Below find the lines that I want to delete:
grep "$(awk '{print $3,$4,$5}' file1)" file2
So one solution might be to pass this message to sed, but I donβt understand how to set the matching pattern to sed from a nested stream. And an internet search suggests that awk might possibly do all of this (or maybe sed or something else), so one wonders what a clean solution would look like.
In addition, speed is somewhat important because other processes may try to modify files during this process (I know that this can cause more complications ...). Matches will usually be found at the end of file2, and not at the beginning (in case there is a way to search for file2 from bottom to top).