Give the k most frequent IP addresses from a large stream of IP addresses in constant time and constant space

I recently met this interview question in my interview with a particular company. I tried to use maxHeap and tried to solve it, but that was unacceptable to him, because the job expression asked me to allow me in constant time and in constant space.

So, I thought that there was something related to the bit mask, and bloom filters came to my mind, but, as we know in the flower filters, we can just check if a certain IP address was visited or not, and can also return false positives.

Can someone please help me with this question. The interview is over, but I still want to understand how it is possible to process IP addresses specifically so that the solution is O (c) in time and space.

+4
source share
3 answers

In fact, allocating an array from 2 ^ 32 is constant space, so you just allocate it, read the stream adding 1 to the [IP] array for each IP reading, then you just sort the array (CONSTANT TIME, notice O (32 * 2 ^ 32)!), And then select the top k-addresses. Voila, constant time, constant space. Ineffective, but the question does not require inefficiency!

+2
source

, , N k N- , .

: FIFO . FIFO , .

: 1, , . .

FIFO - O (1). O (log N) N.

+1

, . , IP-, .

, -

1.) -

2.) ( , DLL )

DLL max 'k'.

struct DLL{

  string IPAddress;
  int count;
  struct DLL* next;
  struct DLL* prev;
}

The key in the hash table is the IP address. The value will be a tuple <count, the address of the Doubly Linked List Node>.

As soon as the IP address comes from the stream, it is checked in the hash table in O (1),

if it not present,    
   if number of nodes in the DLL is less than k, 
        add a new node at the start of the DLL for this IPAddress and a new
        entry is made in the hashtable <IPAddress,<count(=1 here),its    
        address in DLL>>
   else
        other IP Addresses in the DLL must be having a count of at least 1,
        so no point putting it in the DLL. Just add a new entry to the
        hashtable <IPAddress,<count(=1 here),null>
else
   update the tuple i.e increase the value of count by 1 and,
      if DLL node address is not NULL, 
        go to that node in the DLL, increase count there also and shift it 
        rightwards if at all count value now is greater than
        next DLL node count. Keep doing that till the concerned node
        reaches the right position. This is done in O(k). k is a constant as 
        per the problem statement. DLL really comes handy for these operations.
        We have to make sure that the DLL is always sorted in ascending order.
        Also, its very important to make the shifts by swapping links and not 
        by swapping values otherwise we end up updating the hash table for every
        which doesn't increase the time complexity but unnecessarily increases
        the number of operations
      else
        compare this IP Address count with the count in the first node in DLL, if 
        its greater than the count of first node, create a new Node and insert 
        in the DLL appropriately in O(k). Update the hash table for this IP Address 
        and for the first IPAddress in the DLL before purging that node.

Thus, the k most common IP addresses are always available in the DLL at a constant time.

+1
source

All Articles