There are a few questions here:
1) Keep in mind that MongoDB stores all documents in BSON format. Also note that the BSON specification refers to UTF-8 lowercase encoding, not UTF-16 encoding.
Link: http://bsonspec.org/#/specification
2) All drivers, including the JavaScript driver in the mongo shell, must correctly handle strings that are encoded as UTF-8. (If they do not, this is a mistake!) Many of the drivers tend to handle UTF-16, although, as far as I know, UTF-16 is not officially supported.
3) When I checked this with the Python driver, MongoDB was able to successfully load and return a string value containing a broken pair of UTF-16 code. However, I could not load the broken pair of code using the mongo shell, and also could not store the string containing the broken pair of code into the JavaScript variable in the shell.
4) mapReduce () works correctly with string data using the correct pair of UTF-16 codes, but when trying to run mapReduce () in string data containing a broken pair of code, an error will occur.
It seems mapReduce () doesn't work when MongoDB tries to convert BSON to a JavaScript variable for use by the JavaScript engine.
5) I registered Jira issue SERVER-6747 for this problem. Feel free to follow him and vote for him.
William z
source share