Check if duplicate string literals are stored at the same address

I am developing a (C ++) library that uses unordered containers. They require hashes (usually the specialized structure of the std::hash template) for the types of elements they store. In my case, these elements are classes that encapsulate string literals, similar to the conststr example at the bottom of this page . STL offers specialization for char constant pointers, which, however, only compute pointers, as described here in the Remarks section.

There is no specialization for strings C. std::hash<const char*> creates a hash of the pointer value (memory address), it does not check the contents of any array of characters.

Although it is very fast (or so I think), it is not guaranteed by the C ++ standard whether several identical string literals are stored at the same address, as described in this question . If this is not the case, the first hash condition will fail:

For two parameters k1 and k2 equal, std::hash<Key>()(k1) == std::hash<Key>()(k2)

I would like to selectively calculate the hash using the provided specialization, if the above guarantee is given, or some other algorithm otherwise. Although calling on those who include my headers, or building my library to define a specific macro, an implementation may be preferable.

Is there any macro in any C ++ implementation, but mostly g ++ and clang, whose definition ensures that multiple identical string literals are stored at the same address?

Example:

 #ifdef __GXX_SAME_STRING_LITERALS_SAME_ADDRESS__ const char str1[] = "abc"; const char str2[] = "abc"; assert( str1 == str2 ); #endif 
+8
c ++ string-literals c ++ 14
source share
2 answers

Is there any macro in any C ++ implementation, but mostly g ++ and clang, whose definition ensures that multiple identical string literals are stored at the same address?

An attempt to combine identical constants (string constants and floating point constants) through compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge constants to disable this behavior.

It is included at the levels -O, -O2, -O3, -Os.

  • Visual Studio has String Pooling ( / GF option: "Remove duplicate strings" ")

The row pool allows you to use multiple pointers to multiple buffers as multiple pointers to one buffer. In the following code, s and t are initialized to one line. Combining the strings leads to the fact that they point to the same memory:

 char *s = "This is a character buffer"; char *t = "This is a character buffer"; 

Note. Although MSDN uses char* string literals, const char* should be used

  • clang apparently also has the -fmerge-constants option, but I can't find out much about this except the --help section, so I'm not sure if it is really equivalent to gcc one:

Disable constants merging


In any case, how string literals are stored depends on the implementation (many store them in read-only parts of the program).

Instead of building your library on possible implementation-dependent hacks, I can only suggest using std::string instead of C-style strings: they will behave exactly as you expect.

You can create your std::string in its place in your containers using the emplace() methods:

  std::unordered_set<std::string> my_set; my_set.emplace("Hello"); 
+5
source share

Although C ++ doesn't seem to allow you to work with string literals in any way, there is a problem with an ugly but somewhat doable way if you don't mind rewriting your string literals as sequences of characters.

 template <typename T, T...values> struct static_array { static constexpr T array[sizeof...(values)] { values... }; }; template <typename T, T...values> constexpr T static_array<T, values...>::array[]; template <char...values> using str = static_array<char, values..., '\0'>; int main() { return str<'a','b','c'>::array != str<'a','b','c'>::array; } 

It is required to return zero. The compiler must ensure that even if multiple translation units create an instance of str<'a','b','c'> , these definitions are combined and you get only one array.

You will need to make sure that you do not mix this with string literals. Any string literal is not guaranteed to be compared with any of the instance arrays.

+2
source share

All Articles