Get SHA1 Unicode Strings in Crypto ++

I learn C ++ myself, and I have one problem that I can’t solve for more than a week. I hope you help me.

I need to get the SHA1 digest of a Unicode string (e.g. ), but I don't know how to do this.

I tried to do it like this, but it returns the wrong digest!

For wstring('') It returns - A469A61DF29A7568A6CC63318EA8741FA1CF2A7
I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

Regards and sorry for my English :).

CryptoPP 5.6.2 MVC ++ 2013

 #include <iostream> #include "cryptopp562\cryptlib.h" #include "cryptopp562\sha.h" #include "cryptopp562\hex.h" int main() { std::wstring string(L""); int bs_size = (int)string.length() * sizeof(wchar_t); byte* bytes_string = new byte[bs_size]; int n = 0; //real bytes count for (int i = 0; i < string.length(); i++) { wchar_t wcharacter = string[i]; int high_byte = wcharacter & 0xFF00; high_byte = high_byte >> 8; int low_byte = wcharacter & 0xFF; if (high_byte != 0) { bytes_string[n++] = (byte)high_byte; } bytes_string[n++] = (byte)low_byte; } CryptoPP::SHA1 sha1; std::string hash; CryptoPP::StringSource ss(bytes_string, n, true, new CryptoPP::HashFilter(sha1, new CryptoPP::HexEncoder( new CryptoPP::StringSink(hash) ) ) ); std::cout << hash << std::endl; return 0; } 
+5
source share
3 answers

I need to get the SHA1 digest of a Unicode string (e.g. Hi), but I don't know how to do this.

The trick here is you need to know how to encode a Unicode string. On Windows, wchar_t is 2 octets; while on Linux a wchar_t is 4 off. There is a Crypto ++ wiki page on Character Set Features , but that's not so good.

To interact most effectively, always use UTF-8. This means that you are converting UTF-16 or UTF-32 to UTF-8. Since you are on Windows, you will need to call the WideCharToMultiByte function to convert it using CP_UTF8 . If you were on Linux, you would use libiconv .

Crypto ++ has a built-in StringNarrow function that uses C ++. Its in the misc.h file. Be sure to call setlocale before using it.

There are a few questions about using the Windows feature. See, for example, How to use WideCharToMultiByte correctly .


I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

What is a hash (SHA-1, SHA-256, ...)? Is this HMAC (keyed hash)? Is the information salted (for example, the password in the repository)? How is this encoded? I have to ask because I cannot reproduce the desired results:

 SHA-1: 2805AE8E7E12F182135F92FB90843BB1080D3BE8 SHA-224: 891CFB544EB6F3C212190705F7229D91DB6CECD4718EA65E0FA1B112 SHA-256: DD679C0B9FD408A04148AA7D30C9DF393F67B7227F65693FFFE0ED6D0F0ADE59 SHA-384: 0D83489095F455E4EF5186F2B071AB28E0D06132ABC9050B683DA28A463697AD 1195FF77F050F20AFBD3D5101DF18C0D SHA-512: 0F9F88EE4FA40D2135F98B839F601F227B4710F00C8BC48FDE78FF3333BD17E4 1D80AF9FE6FD68515A5F5F91E83E87DE3C33F899661066B638DB505C9CC0153D 

Here is the program I used. Be sure to specify the length of the wide string. If you do not (and use -1 for length) then WideCharToMultiByte will include trailing ASCII-Z in its calculations. Since we use std::string , we do not need a function to include the ASCII-Z terminator.

 int main(int argc, char* argv[]) { wstring m1 = L""; string m2; int req = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), NULL, 0, NULL, NULL); if(req < 0 || req == 0) throw runtime_error("Failed to convert string"); m2.resize((size_t)req); int cch = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), &m2[0], (int)m2.length(), NULL, NULL); if(cch < 0 || cch == 0) throw runtime_error("Failed to convert string"); // Should not be required m2.resize((size_t)cch); string s1, s2, s3, s4, s5; SHA1 sha1; SHA224 sha224; SHA256 sha256; SHA384 sha384; SHA512 sha512; HashFilter f1(sha1, new HexEncoder(new StringSink(s1))); HashFilter f2(sha224, new HexEncoder(new StringSink(s2))); HashFilter f3(sha256, new HexEncoder(new StringSink(s3))); HashFilter f4(sha384, new HexEncoder(new StringSink(s4))); HashFilter f5(sha512, new HexEncoder(new StringSink(s5))); ChannelSwitch cs; cs.AddDefaultRoute(f1); cs.AddDefaultRoute(f2); cs.AddDefaultRoute(f3); cs.AddDefaultRoute(f4); cs.AddDefaultRoute(f5); StringSource ss(m2, true /*pumpAll*/, new Redirector(cs)); cout << "SHA-1: " << s1 << endl; cout << "SHA-224: " << s2 << endl; cout << "SHA-256: " << s3 << endl; cout << "SHA-384: " << s4 << endl; cout << "SHA-512: " << s5 << endl; return 0; } 
+3
source

You say: "But it returns the wrong digest - with what do you compare?

Key points: digests such as SHA-1 do not work with character sequences, but with byte sequences.

What you do in this piece of code is to create a special Unicode character encoding in the string "" . This encoding will (as it turns out) correspond to the UTF-16 encoding if the characters in the string are all in BMP ("the base multilingual plane, which in this case is true"), and if the numbers ending in wcharacter are integers representing the encodings unicode (which is probably correct, but not, I think, guaranteed).

If the collection that you are comparing with it turns the input string into a sequence of bytes using UTF-8 encoding (which is very likely), then this will create a different sequence of bytes from you, so the SHA-1 digest of this sequence will be different from the digest which you calculate here.

So:

  • Check what encoding your test string uses.

  • It is best to use some library functions to specifically generate the UTF-16 or UTF-8 string encoding (as the case may be) that you want to process to make sure that the byte sequence that you are working with is what you think.

An excellent introduction to Unicode and encodings in the named document Absolute minimum Every software developer should absolutely, positively know about Unicode and character sets (no justification!)

+3
source

This seems to work fine for me.

Instead of trying to extract the fragments, I just pass a wide character buffer to const byte* and pass it (and the adjusted size) to the hash function.

 int main() { std::wstring string(L""); CryptoPP::SHA1 sha1; std::string hash; CryptoPP::StringSource ss( reinterpret_cast<const byte*>(string.c_str()), // cast to const byte* string.size() * sizeof(std::wstring::value_type), // adjust for size true, new CryptoPP::HashFilter(sha1, new CryptoPP::HexEncoder( new CryptoPP::StringSink(hash) ) ) ); std::cout << hash << std::endl; return 0; } 

Conclusion:

 C6F8291E68E478DD5BD1BC2EC2A7B7FC0CEE1420 

EDIT: To add.

The result will be encoding dependent. For example, I ran this on Linux , where wchar_t is 4 bytes. On Windows I believe wchar_t can only be 2 bytes.

For consistency, it may be better to use UTF8 to store text in a regular std::string . It also simplifies the API call:

 int main() { std::string string(""); // UTF-8 encoded CryptoPP::SHA1 sha1; std::string hash; CryptoPP::StringSource ss( string, true, new CryptoPP::HashFilter(sha1, new CryptoPP::HexEncoder( new CryptoPP::StringSink(hash) ) ) ); std::cout << hash << std::endl; return 0; } 

Conclusion:

 2805AE8E7E12F182135F92FB90843BB1080D3BE8 
+2
source

All Articles