The original question is a file containing 5 GB of URLs visited on the last day to search for top-level URLs. The problem can be solved using a hash map to count the occurrences of different URLs and search for the vertex k using the min heap, taking into account the time O (n log k).
Now I think that if the input was an unlimited stream of online data (instead of a static file), then how to find the top k-address of the last day?
Or are there any improvements I can make for the system that allow me to get the maximum URL for the last minute and last day and last hours dynamically?
Any hint would be appreciated!
source share