Organization of list items (with repeating items) according to frequency of occurrence

What would be a good way to arrange list items (with duplicate items) according to how often they appear in the list.

I need to use the 5 most common elements in the list.

I am thinking of using a HashMap to count the frequencies of the elements, incrementing the corresponding counter each time an element occurs, and then iterating the HashMap 5 times to find the highest frequency. element over each iteration.

+4
source share
4 answers

How about this approach?

save the map containing count

public static Map <Foo,Integer>; 

 class Foo implements Comparator<Foo>{ private Bar element; public int compare(Foo f1, Foo f2){ return SomeClass.map.get(f1) - SomeClass.map.get(f2); } } 

just update the map with the update in list .

Wrap access to the list with addFooToList() , removeFooFromList() and encapsulate the map update logic there.

+5
source

You can use Guava Multiset and order it by frequency


And about performance. Of course, it depends on how many different values ​​you have, but this test code took about a second on my machine. And I would say that reasonably enough for 10 M items:

 Multiset<Integer> set = HashMultiset.create(); int amount = 10000000; Random random = new Random(); for (int i = 0; i < amount; i++) { set.add(Integer.valueOf(random.nextInt(255))); } TreeSet<Entry<Integer>> sortedEntries = Sets.newTreeSet( new Comparator<Entry<Integer>>() { public int compare(Entry<Integer> a, Entry<Integer> b) { return Ints.compare(a.getCount(), b.getCount()); } }); Iterables.addAll(sortedEntries, set.entrySet()); for (Entry<Integer> entry : Iterables.limit(sortedEntries, 5)) { System.out.println(entry.getElement()); } 
+5
source

Any sorting based on comparison will entail O(N log N) or worse time complexity, so (asymptotically) these are not good tips.

Your approach has O(N) time complexity, and this is the best you can get. You can try lowering the constant (currently you are doing approximately 6*N access to the list items).

I would do it in two iterations: first count the frequencies with a HashMap. Then iterate over the records on the map and save the ordered 5-element array of the 5 most common values ​​that have been discovered so far. With each new element, check if this value is more common than the fifth, most common so far, and update "Top 5" if necessary.


UPDATE A simpler solution of the same time complexity . First calculate the frequencies using the HashMap . Then put all the entries in the PriorityQueue and enter five values. Records should be paired, comparable in frequency (as in @Jigar's solution). Such an ordering will not “match equals” (see Comparable for an explanation), but that's OK.

+2
source

I would also use HashMap. I found the code where I did this:

 HashMap<String, Integer> counts = new HashMap<String, Integer>(); void increment(String s) { Integer oldCount = counts.get(s); if (oldCount == null) { counts.put(s, 1); } else { counts.put(s, oldCount + 1); } } 

List of items:

 Map.Entry<String, Integer>[] array = new Map.Entry[counts.size()]; counts.entrySet().toArray(array); Arrays.sort(array, new Comparator<Map.Entry<String, Integer>>() { public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) { return b.getValue() - a.getValue(); } }); int x = 0, min = 0; for (Map.Entry<String, Integer> el : array) { String k = el.getKey(); println("Count: " + el.getValue() + "\n" + k + "\n\n"); } 
0
source

All Articles