The best data structure for a given set of operations is Add, Restore Min / Max, and Get a specific object.

I am looking for an optimal time and space data structure to support the following operations:

  • Add persons ( name, age ) to the global people data warehouse
  • Remove a person with minimum and maximum age
  • Find a person age named

Here is what I might think:

  • Keep an array of faces and keep adding to the end of the array when a new face is added.
  • Store a hash of the person’s name and age to help get the age of the person with the given name
  • Maintain two objects minPerson and maxPerson for a person with a minimum and maximum age. Update this, if necessary, when adding a new face.

Now, although I keep the hash for better performance (3), I think this may not be the best way if there are a lot of conflicts in the hash. In addition, adding a Person will mean the overhead of adding to the hash.

Is there anything that can be optimized here?

Note. I am looking for the best (balanced) approach to support all of these operations in minimal time and space.

+4
source share
3 answers

It looks like you need a data structure that requires quick inserts, and also supports fast queries on 2 different keys (name and age).

I would suggest storing two data structures: one sorted data structure (for example, a balanced binary search tree), where the key is age, and the value is a pointer to the Person object, and the other is a hash table where the key is name and the value is pointer to the Person object. Please note that we do not save two copies of the same object.

A balanced binary search tree will provide O (log (n)) inserts and max / min queries, while hastable will give us O (1) (amortized) inserts and search queries.

When we add a new Person, we simply add a pointer to it in both data structures. To request a minimum / maximum age, we can get the object by requesting a BST. To request a name, we can simply request a hash table.

Your question does not ask for updates / deletes, but they are also performed by updating both data structures accordingly.

+1
source

You can get rid of the array because it does not provide anything that the other two structures cannot do.

Otherwise, the + min / max hash table will most likely work well for your use case. In fact, this is exactly what I would like to use.

As for getting rid of the hash table, because a bad hash function can lead to collisions: well, don't use a bad hash function. I bet that the default hash function for strings provided by your chosen programming language will be very good out of the box.

+4
source

It looks like you expect the name to be unique idenitifer; otherwise your operation 3 is ambiguous (what is the correct return result if you have two entries for John Smith?)

Assuming that the uniqueness of the name is guaranteed, I would go with a simple hash table with the names. Operations 1 and 3 are trivial to perform. Operation 2 can be performed O (N) times if you want to manually view the data structure, or you can do as you suggest and keep track of min / max and update it when adding / removing entries in the hash table.

0
source

All Articles