Algorithm for matching strings between two large files

I have a question regarding the search algorithm. Currently, I have 2 files in text form, each of which has at least 10 million lines. At the moment, each line is a line, and I want to find each line in the first file, which also appears in the second file. Is there a good way to do this efficiently? Any suggestions from any algorithm or special language feature are appreciated.

+5
source share
1 answer

If you don’t know anything about the file structure (for example, regardless of whether they are sorted or not), there are many different approaches that you could take to solve the problem, which, depending on your limitations on memory and space usage, could be , what are you looking for.

, - . -. . , -. , . O (m) ( m - ) & Omega; (m + n) , , , , -. ( ) .

, , . . -. , , . - . runtime & Omega; (mn/b), b - ( O (m/b) n ). , , , .

, , (, , , ). , , . , :

  • : , .
  • :
    • , .
    • , , , .
    • .

O (n log n) O (n), , . , O (1) (, ), . , n m, O (mn log n), O (m). O (mn), O (m). - (, 16 32 ). , - , , , , , - O (1).

, (, 8 ), - 64 . , , - - ( -, ..) , -, , .

, !

Woohoo! 1000- !: -)

+14

All Articles