Sort words within memory limits

Interview prep qn: Given a long list of words, return a list of different words along with a count of only 16 GB of memory used.

I am thinking of using a HashMap that stores only one copy of a word, and then increases the value if the same word occurs again. This is the only approach I can think of. The overall difficulty would be O (n), since I need to go through the entire list of words once to fill my hash file with a word and / or increase its number.

What I'm not sure about how to include 16 GB of memory fact in memory?

If each word is 100 bytes (which is most likely not the way the word length can change), then my hash file may contain x words long in length, so what? How can I logically reach the limit of my decision? I’m a little unsure of what to do here.

+4
source share
6 answers

Firstly, this is not a sorting problem. At a fundamental level, this is a sorting problem.

I can imagine three approaches to solving it ... depending on the number of different words that you have.

  • , - (, a TreeMap<String, Integer>), . ( ), . (A TreeMap ...)

  • , :

    • :

      • N .
      • ( ) .
      • N , , N.

      , N, . N ( ) , " ".

    • :

      • N , "" N .
      • N , i word + 1, / ( TreeMap, ).
      • / .

​​ (O(MlogM)), , TreeMap. HashMap, O(M), ... "" .

+3

O (n)

Class WordStore
{
    private static HashSet<String> words;
    private int count;
    private long byte;
    //Singleton approach
    public static int addWord(String word)
    {
        if(byte!=17179869184 || (byte+word.length()<17179869184) //checking if the words size is upto 16GB 
        {
            words.add(word);
            count++;
            byte+=byte+word.length();
            return 0;
        }
        else
        {
            System.out.println("Word Count in 16GB :"+count);
            return 1;
        }
    }
}
Class Loader
{
    public static void main(String[] a)
    {
        while(1)
        {       
            String a=readWordOneByOne();
            if(WordStore.addWord(a))
            {
                break;
            }
        }
    }
}
+1

word -> (word, 1). , (word,1) + (word,1) = (word,2). hashmap .

, ( , , ). ((word1, count1,) ... (wordN,countN)). - () , , , filename= "split" + numwrites + (key%5), numwrites .

(% 5), .

, - , . , , . Merge-Sort , O (n * log (n))

. MapReduce .

+1

: -

  • : .

  • , . : TreeSet HashMap HashMap.

  • :

  • : , .
  • .
  • TreeSet .
  • TreeSet
  • TreeSet.
  • .
  • ,
  • , , , .
  • , , , .
  • ,
+1

/, . , 48- . 1/3 (16 ). , , 1 . , 1 . . , , , .

0

( ) - ( ), . .

Thus, the only limitation is how much can be written. Thus, the number of characters will be sufficient and allow the program to select a backup copy where it was stopped, and once again it will turn out memory.

0
source

All Articles