Java: determining the number of words in BOTH data sources?

I am trying to figure out if there is an easy way to count the number of words that appear in a small paragraph (# 1) and a small paragraph (# 2).

Typically, Im determines how much overlaps in these paragraphs are word for word. Therefore, if (# 1) contains the word "happy" and (# 2) contains the word "happy", which will be like the value +1.

I know that I could use String.contains() for every word in (# 1) applied to (# 2). But I was wondering if there is anything more efficient that I could use

+4
source share
1 answer

You can create two sets s1 and s2 containing all the words from the first and second paragraphs respectively, and cross them: s1.retainAll(s2) . Sounds easy enough.

Update
Works for me

  Set<String> s1 = new HashSet<String>(Arrays.asList("abc xyz 123".split("\\s"))); Set<String> s2 = new HashSet<String>(Arrays.asList("xyz 000 111".split("\\s"))); s1.retainAll(s2); System.out.println(s1.size()); 

Remember to remove the empty word from both sets.

+7
source

Source: https://habr.com/ru/post/1313565/


All Articles