Comparing two lines in java and identifying duplicate words

I am trying to compare two lines and identify duplicate words. For instance:

String1 = "Hello, my name is John." String2 = "Can you tell me your name please?" 

Comparing String1 and String2 will return the word; "Name".

I know that you can break these two lines into an array of words, and then iterate over each word of each line in a two-dimensional array. However, it is computationally expensive in O (n ^ 2), and I was wondering if there is a faster way to do this?

Thanks.

EDIT: modified the example for clarity.

+6
source share
2 answers

After entering strings into word arrays:

You can add all the elements in the first array to the hash map, and then scan the second array to see if each of the elements in the hash map exists. Since the hashmap access time is O (1), this will be the O (n + m) time complexity.

If you don't want to use extra space, you can sort both arrays in O (nlogn) and then compare the elements in O (n + m), which will give you O (nlogn) in total.

+12
source

One simple solution is to use the Sets.intersection method for Guava Sets . This is pretty easy:

 String s1 = "Hello, my name is John."; String s2 = "Can you tell me your name?"; Splitter splitter = Splitter.onPattern("\\W").trimResults().omitEmptyStrings(); Set<String> intersection = Sets.intersection(// Sets.newHashSet(splitter.split(s1)), // Sets.newHashSet(splitter.split(s2))); System.out.println(intersection); 

Output:

 [name] 

You can also find additional information on algorithms for detecting Set intersections on this topic .

+6
source

All Articles