I am using Python 3.5, and I am trying to take a block of byte text that may or may not contain special Chinese characters and output it to a file. It works for entries that do not contain Chinese characters, but breaks when they do it. Chinese characters are always the person’s name and always in addition to the English spelling of their name. JSON text is formatted and needs to be decoded before I can download it. It seems that decoding is beautiful and does not give me any errors. When I try to write decoded text to a file, it gives me the following error message:
UnicodeEncodeError: codec 'charmap' cannot encode characters at positions 14-18: character cards on undefined
Here is an example of the source data that I get before I do anything with it:
b' "isBulkRecipient": "false",\r\n "name": "Name in, English \xef' b'\xab\x62\xb6\xe2\x15\x8a\x8b\x8a\xee\xab\x89\xcf\xbc\x8a",\r\n
Here is the code I'm using:
recipientData = json.loads(recipientContent.decode('utf-8', 'ignore')) recipientName = recipientData['signers'][0]['name'] pprint(recipientName) with open('envelope recipient list.csv', 'a', newline='') as fp: a = csv.writer(fp, delimiter=',') csvData = [[recipientName]] a.writerows(csvData)
recipientContent obtained from an API call. I do not need to have Chinese characters in the output file. Any advice would be greatly appreciated!
Update:
I made some manual workarounds for each torn record, as well as other entries that did not contain Chinese special characters, but contained them in other languages, and also violated the program. Special characters are found only in the name field. Thus, the name may be something like "Ałex", where it is a mixture of ordinary and special characters. Before I decode a string containing this information, I can print it on the screen, and it looks like this: b'name": "A\xc5ex",\r\n
But after I decrypt it in utf-8, it will give me an error if I try to output it. Error message: UnicodeEncodeError: 'charmap' codec can't encode character 'u0142' in position 2- character maps to -undefined-
I looked what it was, and this is a special symbol.