I have an IndexEntry class that looks like this:
public class IndexEntry implements Comparable<IndexEntry> { private String word; private int frequency; private int documentId; ... //Simple getters for all properties public int getFrequency() { return frequency; } ... }
I store objects of this class in Guava SortedSetMultimap (which allows multiple values ββfor the key), where I map the word String to some IndexEntry s. Behind the scenes, it displays each word in a SortedSet<IndexEntry> .
I am trying to implement a kind of indexed structure of words in documents and the frequency of their appearance within documents.
I know how to get the score of the most common word, but I can't seem to get the word.
Here's what I need to get a counter for the most common term, where entries are SortedSetMultimap , as well as helper methods:
public int mostFrequentWordFrequency() { return entries .keySet() .stream() .map(this::totalFrequencyOfWord) .max(Comparator.naturalOrder()).orElse(0); } public int totalFrequencyOfWord(String word) { return getEntriesOfWord(word) .stream() .mapToInt(IndexEntry::getFrequency) .sum(); } public SortedSet<IndexEntry> getEntriesOfWord(String word) { return entries.get(word); }
I am trying to learn the features of Java 8 because they seem really useful. However, I cannot get the thread to work the way I want. I want to be able to have both a word and a frequency at the end of the stream, but not assuming that if I have a word, I can easily get common occurrences of that word.
Currently, I continue to end up with Stream<SortedSet<IndexEntry>> , with which I cannot do anything. I donβt know how to get the most frequent word without frequencies, but if I have a frequency, I canβt track the corresponding word. I tried to create a WordFrequencyPair POJO class to store both, but then I just had Stream<SortedSet<WordFrequencyPair>> , and I could not figure out how to map it to something useful.
What am I missing?
java multimap java-8 java-stream
Cache staheli
source share