Search for duplicate words in a string and count repetitions

I need to find duplicate words in a string, and then count how many times they were repeated. So basically, if the input string is like this:

String s = "House, House, House, Dog, Dog, Dog, Dog"; 

I need to create a new list of lines without repetitions and save the number of repetitions for each word in another place, for example:

New line: "House, dog"

New Int Array: [3, 4]

Is there any way to do this with Java? I managed to separate the line using s.split (), but how do I count the repetitions and eliminate them in a new line? Thanks!

+10
java string
source share
27 answers

You have a hard job. Now you can simply use Map to count the occurrences:

 Map<String, Integer> occurrences = new HashMap<String, Integer>(); for ( String word : splitWords ) { Integer oldCount = occurrences.get(word); if ( oldCount == null ) { oldCount = 0; } occurrences.put(word, oldCount + 1); } 

Using map.get(word) will tell you many times when a word happened. You can create a new list by iterating through map.keySet() :

 for ( String word : occurrences.keySet() ) { //do something with word } 

Note that the order of what you get from keySet is arbitrary. If you need words to sort when they first appear in your input line, you should use LinkedHashMap .

+21
source share

As mentioned by others, use String :: split () and then some map (hashmap or linkedhashmap) and then merge the result. For the full use of the code.

 import java.util.*; public class Genric<E> { public static void main(String[] args) { Map<String, Integer> unique = new LinkedHashMap<String, Integer>(); for (String string : "House, House, House, Dog, Dog, Dog, Dog".split(", ")) { if(unique.get(string) == null) unique.put(string, 1); else unique.put(string, unique.get(string) + 1); } String uniqueString = join(unique.keySet(), ", "); List<Integer> value = new ArrayList<Integer>(unique.values()); System.out.println("Output = " + uniqueString); System.out.println("Values = " + value); } public static String join(Collection<String> s, String delimiter) { StringBuffer buffer = new StringBuffer(); Iterator<String> iter = s.iterator(); while (iter.hasNext()) { buffer.append(iter.next()); if (iter.hasNext()) { buffer.append(delimiter); } } return buffer.toString(); } } 

New line Output = House, Dog

Int array (or rather list) Values = [3, 4] (you can use List :: toArray) to get an array.

+3
source share

Try it,

 public class DuplicateWordSearcher { @SuppressWarnings("unchecked") public static void main(String[] args) { String text = "arbkcd se fgadfssfds ft gh f ws wfvxsghdhjjkf sd je wed adf"; List<String> list = Arrays.asList(text.split(" ")); Set<String> uniqueWords = new HashSet<String>(list); for (String word : uniqueWords) { System.out.println(word + ": " + Collections.frequency(list, word)); } } 

}

+3
source share
 public class StringsCount{ public static void main(String args[]) { String value = "This is testing Program testing Program"; String item[] = value.split(" "); HashMap<String, Integer> map = new HashMap<>(); for (String t : item) { if (map.containsKey(t)) { map.put(t, map.get(t) + 1); } else { map.put(t, 1); } } Set<String> keys = map.keySet(); for (String key : keys) { System.out.println(key); System.out.println(map.get(key)); } } } 
+2
source share

It might help you somehow.

 String st="I am am not the one who is thinking I one thing at time"; String []ar = st.split("\\s"); Map<String, Integer> mp= new HashMap<String, Integer>(); int count=0; for(int i=0;i<ar.length;i++){ count=0; for(int j=0;j<ar.length;j++){ if(ar[i].equals(ar[j])){ count++; } } mp.put(ar[i], count); } System.out.println(mp); 
+1
source share

If this is homework, then all I can say is use String.split() and HashMap<String,Integer> .

(I see that you already found split (). Then you are on the right lines.)

0
source share
 /*count no of Word in String using TreeMap we can use HashMap also but word will not display in sorted order */ import java.util.*; public class Genric3 { public static void main(String[] args) { Map<String, Integer> unique = new TreeMap<String, Integer>(); String string1="Ram:Ram: Dog: Dog: Dog: Dog:leela:leela:house:house:shayam"; String string2[]=string1.split(":"); for (int i=0; i<string2.length; i++) { String string=string2[i]; unique.put(string,(unique.get(string) == null?1:(unique.get(string)+1))); } System.out.println(unique); } } 
0
source share

You can use the trie prefix tree data structure to store words and track the number of words in the Node prefix tree.

  #define ALPHABET_SIZE 26 // Structure of each node of prefix tree struct prefix_tree_node { prefix_tree_node() : count(0) {} int count; prefix_tree_node *child[ALPHABET_SIZE]; }; void insert_string_in_prefix_tree(string word) { prefix_tree_node *current = root; for(unsigned int i=0;i<word.size();++i){ // Assuming it has only alphabetic lowercase characters // Note ::::: Change this check or convert into lower case const unsigned int letter = static_cast<int>(word[i] - 'a'); // Invalid alphabetic character, then continue // Note :::: Change this condition depending on the scenario if(letter > 26) throw runtime_error("Invalid alphabetic character"); if(current->child[letter] == NULL) current->child[letter] = new prefix_tree_node(); current = current->child[letter]; } current->count++; // Insert this string into Max Heap and sort them by counts } // Data structure for storing in Heap will be something like this struct MaxHeapNode { int count; string word; }; 

After you insert all the words, you should type the word and count, sorting through Maxheap.

0
source share
 //program to find number of repeating characters in a string //Developed by Subash<subash_senapati@ymail.com> import java.util.Scanner; public class NoOfRepeatedChar { public static void main(String []args) { //input through key board Scanner sc = new Scanner(System.in); System.out.println("Enter a string :"); String s1= sc.nextLine(); //formatting String to char array String s2=s1.replace(" ",""); char [] ch=s2.toCharArray(); int counter=0; //for-loop tocompare first character with the whole character array for(int i=0;i<ch.length;i++) { int count=0; for(int j=0;j<ch.length;j++) { if(ch[i]==ch[j]) count++; //if character is matching with others } if(count>1) { boolean flag=false; //for-loop to check whether the character is already refferenced or not for (int k=i-1;k>=0 ;k-- ) { if(ch[i] == ch[k] ) //if the character is already refferenced flag=true; } if( !flag ) //if(flag==false) counter=counter+1; } } if(counter > 0) //if there is/are any repeating characters System.out.println("Number of repeating charcters in the given string is/are " +counter); else System.out.println("Sorry there is/are no repeating charcters in the given string"); } } 
0
source share
 public static void main(String[] args) { String s="sdf sdfsdfsd sdfsdfsd sdfsdfsd sdf sdf sdf "; String st[]=s.split(" "); System.out.println(st.length); Map<String, Integer> mp= new TreeMap<String, Integer>(); for(int i=0;i<st.length;i++){ Integer count=mp.get(st[i]); if(count == null){ count=0; } mp.put(st[i],++count); } System.out.println(mp.size()); System.out.println(mp.get("sdfsdfsd")); } 
0
source share

If you pass a String argument, it will count the repetition of each word

 /** * @param string * @return map which contain the word and value as the no of repatation */ public Map findDuplicateString(String str) { String[] stringArrays = str.split(" "); Map<String, Integer> map = new HashMap<String, Integer>(); Set<String> words = new HashSet<String>(Arrays.asList(stringArrays)); int count = 0; for (String word : words) { for (String temp : stringArrays) { if (word.equals(temp)) { ++count; } } map.put(word, count); count = 0; } return map; } 

exit:

  Word1=2, word2=4, word2=1,. . . 
0
source share
 import java.util.HashMap; import java.util.LinkedHashMap; public class CountRepeatedWords { public static void main(String[] args) { countRepeatedWords("Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead."); } public static void countRepeatedWords(String wordToFind) { String[] words = wordToFind.split(" "); HashMap<String, Integer> wordMap = new LinkedHashMap<String, Integer>(); for (String word : words) { wordMap.put(word, (wordMap.get(word) == null ? 1 : (wordMap.get(word) + 1))); } System.out.println(wordMap); } } 
0
source share

I hope this helps you

public void countInPara (String str) {

  Map<Integer,String> strMap = new HashMap<Integer,String>(); List<String> paraWords = Arrays.asList(str.split(" ")); Set<String> strSet = new LinkedHashSet<>(paraWords); int count; for(String word : strSet) { count = Collections.frequency(paraWords, word); strMap.put(count, strMap.get(count)==null ? word : strMap.get(count).concat(","+word)); } for(Map.Entry<Integer,String> entry : strMap.entrySet()) System.out.println(entry.getKey() +" :: "+ entry.getValue()); } 
0
source share
 import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; public class DuplicateWord { public static void main(String[] args) { String para = "this is what it is this is what it can be"; List < String > paraList = new ArrayList < String > (); paraList = Arrays.asList(para.split(" ")); System.out.println(paraList); int size = paraList.size(); int i = 0; Map < String, Integer > duplicatCountMap = new HashMap < String, Integer > (); for (int j = 0; size > j; j++) { int count = 0; for (i = 0; size > i; i++) { if (paraList.get(j).equals(paraList.get(i))) { count++; duplicatCountMap.put(paraList.get(j), count); } } } System.out.println(duplicatCountMap); List < Integer > myCountList = new ArrayList < > (); Set < String > myValueSet = new HashSet < > (); for (Map.Entry < String, Integer > entry: duplicatCountMap.entrySet()) { myCountList.add(entry.getValue()); myValueSet.add(entry.getKey()); } System.out.println(myCountList); System.out.println(myValueSet); } } 

Input: this is what it is:

Exit:

[this, is, what, this, is, this, is, what, this, maybe]

{can = 1, what = 2, be = 1, this = 2, is = 3, it = 2}

[1, 2, 1, 2, 3, 2]

[can, what, be, this, is, it]

0
source share
 import java.util.HashMap; import java.util.Scanner; public class class1 { public static void main(String[] args) { Scanner in = new Scanner(System.in); String inpStr = in.nextLine(); int key; HashMap<String,Integer> hm = new HashMap<String,Integer>(); String[] strArr = inpStr.split(" "); for(int i=0;i<strArr.length;i++){ if(hm.containsKey(strArr[i])){ key = hm.get(strArr[i]); hm.put(strArr[i],key+1); } else{ hm.put(strArr[i],1); } } System.out.println(hm); } 

}

0
source share

Please use the code below. This is the easiest option in my analysis. I hope you will like it:

 import java.util.Arrays; import java.util.Collections; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Scanner; import java.util.Set; public class MostRepeatingWord { String mostRepeatedWord(String s){ String[] splitted = s.split(" "); List<String> listString = Arrays.asList(splitted); Set<String> setString = new HashSet<String>(listString); int count = 0; int maxCount = 1; String maxRepeated = null; for(String inp: setString){ count = Collections.frequency(listString, inp); if(count > maxCount){ maxCount = count; maxRepeated = inp; } } return maxRepeated; } public static void main(String[] args) { System.out.println("Enter The Sentence: "); Scanner s = new Scanner(System.in); String input = s.nextLine(); MostRepeatingWord mrw = new MostRepeatingWord(); System.out.println("Most repeated word is: " + mrw.mostRepeatedWord(input)); } } 
0
source share
 package day2; import java.util.ArrayList; import java.util.HashMap;`enter code here` import java.util.List; public class DuplicateWords { public static void main(String[] args) { String S1 = "House, House, House, Dog, Dog, Dog, Dog"; String S2 = S1.toLowerCase(); String[] S3 = S2.split("\\s"); List<String> a1 = new ArrayList<String>(); HashMap<String, Integer> hm = new HashMap<>(); for (int i = 0; i < S3.length - 1; i++) { if(!a1.contains(S3[i])) { a1.add(S3[i]); } else { continue; } int Count = 0; for (int j = 0; j < S3.length - 1; j++) { if(S3[j].equals(S3[i])) { Count++; } } hm.put(S3[i], Count); } System.out.println("Duplicate Words and their number of occurrences in String S1 : " + hm); } } 
0
source share
 public class Counter { private static final int COMMA_AND_SPACE_PLACE = 2; private String mTextToCount; private ArrayList<String> mSeparateWordsList; public Counter(String mTextToCount) { this.mTextToCount = mTextToCount; mSeparateWordsList = cutStringIntoSeparateWords(mTextToCount); } private ArrayList<String> cutStringIntoSeparateWords(String text) { ArrayList<String> returnedArrayList = new ArrayList<>(); if(text.indexOf(',') == -1) { returnedArrayList.add(text); return returnedArrayList; } int position1 = 0; int position2 = 0; while(position2 < text.length()) { char c = ','; if(text.toCharArray()[position2] == c) { String tmp = text.substring(position1, position2); position1 += tmp.length() + COMMA_AND_SPACE_PLACE; returnedArrayList.add(tmp); } position2++; } if(position1 < position2) { returnedArrayList.add(text.substring(position1, position2)); } return returnedArrayList; } public int[] countWords() { if(mSeparateWordsList == null) return null; HashMap<String, Integer> wordsMap = new HashMap<>(); for(String s: mSeparateWordsList) { int cnt; if(wordsMap.containsKey(s)) { cnt = wordsMap.get(s); cnt++; } else { cnt = 1; } wordsMap.put(s, cnt); } return printCounterResults(wordsMap); } private int[] printCounterResults(HashMap<String, Integer> m) { int index = 0; int[] returnedIntArray = new int[m.size()]; for(int i: m.values()) { returnedIntArray[index] = i; index++; } return returnedIntArray; } 

}

0
source share
 //program to find number of repeating characters in a string //Developed by Rahul Lakhmara import java.util.*; public class CountWordsInString { public static void main(String[] args) { String original = "I am rahul am i sunil so i can say am i"; // making String type of array String[] originalSplit = original.split(" "); // if word has only one occurrence int count = 1; // LinkedHashMap will store the word as key and number of occurrence as // value Map<String, Integer> wordMap = new LinkedHashMap<String, Integer>(); for (int i = 0; i < originalSplit.length - 1; i++) { for (int j = i + 1; j < originalSplit.length; j++) { if (originalSplit[i].equals(originalSplit[j])) { // Increment in count, it will count how many time word // occurred count++; } } // if word is already present so we will not add in Map if (wordMap.containsKey(originalSplit[i])) { count = 1; } else { wordMap.put(originalSplit[i], count); count = 1; } } Set word = wordMap.entrySet(); Iterator itr = word.iterator(); while (itr.hasNext()) { Map.Entry map = (Map.Entry) itr.next(); // Printing System.out.println(map.getKey() + " " + map.getValue()); } } } 
0
source share
  public static void main(String[] args){ String string = "elamparuthi, elam, elamparuthi"; String[] s = string.replace(" ", "").split(","); String[] op; String ops = ""; for(int i=0; i<=s.length-1; i++){ if(!ops.contains(s[i]+"")){ if(ops != "")ops+=", "; ops+=s[i]; } } System.out.println(ops); } 
0
source share

Here are the steps for counting duplicate words per line

  • Create an empty HashMap of type String and Integer
  • Separate a string using space, delimiter and assign it to String []
  • Iterate through String [] array after splitting using for-each loop
  • Note: we will convert all strings to lowercase letters before checking a case-insensitive value.
  • Check if a specific word is present in the HashMap using containsKey (k) map interface method
  • If it contains, then increase the value of count by 1 using the put (K, V) Map method
  • Otherwise, insert using the put () method Map with the value count as 1
  • Finally, print the map using the keySet () or entrySet () method for the Map.Entry interface

The completed program bit is long, as it reads the contents of the String from the local file. You can check the article in the link below

http://www.benchresources.net/count-and-print-number-of-repeated-word-occurrences-in-a-string-in-java/

0
source share

For strings without space, we can use the code below

 private static void findRecurrence(String input) { final Map<String, Integer> map = new LinkedHashMap<>(); for(int i=0; i<input.length(); ) { int pointer = i; int startPointer = i; boolean pointerHasIncreased = false; for(int j=0; j<startPointer; j++){ if(pointer<input.length() && input.charAt(j)==input.charAt(pointer) && input.charAt(j)!=32){ pointer++; pointerHasIncreased = true; }else{ if(pointerHasIncreased){ break; } } } if(pointer - startPointer >= 2) { String word = input.substring(startPointer, pointer); if(map.containsKey(word)){ map.put(word, map.get(word)+1); }else{ map.put(word, 1); } i=pointer; }else{ i++; } } for(Map.Entry<String, Integer> entry : map.entrySet()){ System.out.println(entry.getKey() + " = " + (entry.getValue()+1)); } } 

Passing some input as "hahaha" or "ba na na" or "xxxyyyzzzxxxzzz" gives the desired result.

0
source share

Once you get the words from the string, this is easy. Starting with Java 10, you can try the following code:

 import java.util.Arrays; import java.util.stream.Collectors; public class StringFrequencyMap { public static void main(String... args) { String[] wordArray = {"House", "House", "House", "Dog", "Dog", "Dog", "Dog"}; var freq = Arrays.stream(wordArray) .collect(Collectors.groupingBy(x -> x, Collectors.counting())); System.out.println(freq); } } 

Exit:

 {House=3, Dog=4} 
0
source share

Hope this helps:

 public static int countOfStringInAText(String stringToBeSearched, String masterString){ int count = 0; while (masterString.indexOf(stringToBeSearched)>=0){ count = count + 1; masterString = masterString.substring(masterString.indexOf(stringToBeSearched)+1); } return count; } 
0
source share
 package string; import java.util.HashMap; import java.util.Map; import java.util.Set; public class DublicatewordinanArray { public static void main(String[] args) { String str = "This is Dileep Dileep Kumar Verma Verma"; DuplicateString(str); } public static void DuplicateString(String str) { String word[] = str.split(" "); Map < String, Integer > map = new HashMap < String, Integer > (); for (String w: word) if (!map.containsKey(w)) { map.put(w, 1); } else { map.put(w, map.get(w) + 1); } Set < Map.Entry < String, Integer >> entrySet = map.entrySet(); for (Map.Entry < String, Integer > entry: entrySet) if (entry.getValue() > 1) { System.out.printf("%s : %d %n", entry.getKey(), entry.getValue()); } } } 
0
source share

Using Java 8 thread collectors :

 public static Map<String, Integer> countRepetitions(String str) { return Arrays.stream(str.split(", ")) .collect(Collectors.toMap(s -> s, s -> 1, (a, b) -> a + 1)); } 

Entrance: "House, House, House, Dog, Dog, Dog, Dog, Cat"

Output: {Cat=1, House=3, Dog=4}

0
source share

Using java8

 private static void findWords(String s, List<String> output, List<Integer> count){ String[] words = s.split(", "); Map<String, Integer> map = new LinkedHashMap<>(); Arrays.stream(words).forEach(e->map.put(e, map.getOrDefault(e, 0) + 1)); map.forEach((k,v)->{ output.add(k); count.add(v); }); } 

Also use LinkedHashMap if you want to keep the insertion order

 private static void findWords(){ String s = "House, House, House, Dog, Dog, Dog, Dog"; List<String> output = new ArrayList<>(); List<Integer> count = new ArrayList<>(); findWords(s, output, count); System.out.println(output); System.out.println(count); } 

Exit

 [House, Dog] [3, 4] 
0
source share

All Articles