Unordered set (const char) is much slower than an unordered set (string)

I load a very long list from disk to unordered_set. If I use a rowset, it is very fast. Checklist about 7 MB downloads in 1 second. However, using a char pointer set takes about 2.1 minutes!

Here is the code for the string version:

unordered_set<string> Set; string key; while (getline(fin, key)) { Set.insert(key); } 

Here is the code for the char * version:

 struct unordered_eqstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; struct unordered_deref { template <typename T> size_t operator()(const T* p) const { return hash<T>()(*p); } }; unordered_set<const char*, unordered_deref, unordered_eqstr> Set; string key; while (getline(fin, key)) { char* str = new(mem) char[key.size()+1]; strcpy(str, key.c_str()); Set.insert(str); } 

"new (mem)" is because I use a custom memory manager so that I can allocate large blocks of memory and give them to tiny objects like c. However, I tested this with the usual β€œnew”, and the results are identical. I also used the memory manager in other tools without problems.

Two structures are needed to create an insert and search hash based on the actual string c, not its address. The unordered_erif that I really found here when the stack overflowed.

In the end, I need to upload files with several gigabytes. This is why I use a custom memory manager, but also why this terrible slowdown is unacceptable. Any ideas?

+4
source share
1 answer

Here we go.

 struct unordered_deref { size_t operator()(const char* p) const { return hash<string>()(p); } }; 
+4
source

All Articles