Algorithm for matching strings between two large files

Question

Algorithm for matching strings between two large files

I have a question regarding the search algorithm. Currently, I have 2 files in text form, each of which has at least 10 million lines. At the moment, each line is a line, and I want to find each line in the first file, which also appears in the second file. Is there a good way to do this efficiently? Any suggestions from any algorithm or special language feature are appreciated.

+5

string algorithm matching search

Shang wang Sep 01 '11 at 21:44

source share

1 answer

templatetypedef · Accepted Answer · 2011-09-01T22:16:44+0000

If you don’t know anything about the file structure (for example, regardless of whether they are sorted or not), there are many different approaches that you could take to solve the problem, which, depending on your limitations on memory and space usage, could be , what are you looking for.

, - . -. . , -. , . O (m) ( m - ) & Omega; (m + n) , , , , -. ( ) .

, , . . -. , , . - . runtime & Omega; (mn/b), b - ( O (m/b) n ). , , , .

, , (, , , ). , , . , :

: , .
:
- , .
- , , , .
- .

O (n log n) O (n), , . , O (1) (, ), . , n m, O (mn log n), O (m). O (mn), O (m). - (, 16 32 ). , - , , , , , - O (1).

, (, 8 ), - 64 . , , - - ( -, ..) , -, , .

, !

Woohoo! 1000- !: -)

Algorithm for matching strings between two large files

More articles: