Show non-printable characters in a string

Question

Show non-printable characters in a string

Is it possible to render non-printable characters in a python string with hexadecimal values?

eg. If I have a line with a new line inside, I would like to replace it with \x0a .

I know there is repr() that will give me ... \n , but I'm looking for the hex version.

+12

python python-3.x escaping

georgij Dec 18 '12 at 7:00

source share

6 answers

I don’t know any built-in method, but this is pretty easy to do with understanding:

 import string printable = string.ascii_letters + string.digits + string.punctuation + ' ' def hex_escape(s): return ''.join(c if c in printable else r'\x{0:02x}'.format(ord(c)) for c in s)

+14

ecatmur Dec 18 '12 at 14:55

source share

I'm late to the party, but if you need it for easy debugging, I find this works:

 string = "\n\t\nHELLO\n\t\n\a\17" procd = [c for c in string] print(procd) # Prints ['\n,', '\t,', '\n,', 'H,', 'E,', 'L,', 'L,', 'O,', '\n,', '\t,', '\n,', '\x07,', '\x0f,']

Awful, but it helped me find non-printable characters in a string.

+10

Carcigenicate Mar 01 '15 at 0:58

source share

Changing ecatmur's solution for handling non-printable non-ASCII characters makes it less trivial and more unpleasant:

 def escape(c): if c.printable(): return c c = ord(c) if c <= 0xff: return r'\x{0:02x}'.format(c) elif c <= '\uffff': return r'\u{0:04x}'.format(c) else: return r'\U{0:08x}'.format(c) def hex_escape(s): return ''.join(escape(c) for c in s)

Of course, if str.isprintable not exactly an exact definition, you can write another function. (Note that this is a completely different set from the fact that in string.printable , in addition to handling non-ASCII printable and non-printable characters, it also treats \n , \r , \t , \x0b and \x0c as non-printable.

You can make it more compact; it is explicit simply to show all the steps involved in handling Unicode strings. For example:

 def escape(c): if c.printable(): return c elif c <= '\xff': return r'\x{0:02x}'.format(ord(c)) else: return c.encode('unicode_escape').decode('ascii')

Indeed, no matter what you do, you will have to handle \r , \n and \t explicitly, because all the built-in and stdlib functions that I know of are these special sequences instead of their hex versions.

+2

abarnert Dec 18 '12 at 21:56

source share

I did something similar once, getting a subclass of str with a custom __repr__() method that did what I wanted. This is not exactly what you are looking for, but can give you some ideas.

 # -*- coding: iso-8859-1 -*- # special string subclass to override the default # representation method. main purpose is to # prefer using double quotes and avoid hex # representation on chars with an ord > 128 class MsgStr(str): def __repr__(self): # use double quotes unless there are more of them within the string than # single quotes if self.count("'") >= self.count('"'): quotechar = '"' else: quotechar = "'" rep = [quotechar] for ch in self: # control char? if ord(ch) < ord(' '): # remove the single quotes around the escaped representation rep += repr(str(ch)).strip("'") # embedded quote matching quotechar being used? elif ch == quotechar: rep += "\\" rep += ch # else just use others as they are else: rep += ch rep += quotechar return "".join(rep) if __name__ == "__main__": s1 = '\tWürttemberg' s2 = MsgStr(s1) print "str s1:", s1 print "MsgStr s2:", s2 print "--only the next two should differ--" print "repr(s1):", repr(s1), "# uses built-in string 'repr'" print "repr(s2):", repr(s2), "# uses custom MsgStr 'repr'" print "str(s1):", str(s1) print "str(s2):", str(s2) print "repr(str(s1)):", repr(str(s1)) print "repr(str(s2)):", repr(str(s2)) print "MsgStr(repr(MsgStr('\tWürttemberg'))):", MsgStr(repr(MsgStr('\tWürttemberg')))

0

martineau Dec 18 '12 at 12:24

source share

There is also a way to print non-printable characters in the sense that they are executed as commands inside a line, even if they are not visible (transparent) in the line, and their presence can be observed by measuring the length of the line with len and also simply by placing the mouse cursor at the beginning of the line and by looking / counting how many times you need to press the arrow key to go from beginning to end, oddly enough, some individual characters can have a length of, for example, 3, which seems puzzling. (Not sure if this has already been demonstrated in previous answers)

In the screenshot below, I inserted a 135-bit string that has a specific structure and format (which I had to manually create for certain bit positions and its total length) so that it is interpreted by ascii by a specific program I 'and in the resulting printed line contains non-printable characters, such as ~~a line break, which literally causes a line break~~ (correction: form feed, a new page that I had in mind, not a line break), there is an additional print output the whole empty line between the printed result (see below):

An example of printing non-printable characters that appear on a printed line

 Input a string:100100001010000000111000101000101000111011001110001000100001100010111010010101101011100001011000111011001000101001000010011101001000000 HPQGg]+\,vE!:@ >>> len('HPQGg]+\,vE!:@') 17 >>>

In the above code snippet, try copying and pasting the line HPQGg]+\,vE!:@ Directly from this site and see what happens when you paste it into Python IDLE.

Hint: You must click on the arrow / cursor three times to scroll through the two letters from P to Q , even if they appear next to each other, since there is actually a File Separator ascii command between them. them.

However, despite the fact that we get the same initial value when decoding it as a byte array into hexadecimal, if we convert this hexadecimal code back to bytes, they look different (maybe a lack of coding, I'm not sure), but In any case, the above program output prints non-printable characters (I accidentally stumbled upon this while trying to develop a compression method / experiment).

 >>> bytes(b'HPQGg]+\,vE!:@').hex() '48501c514767110c5d2b5c2c7645213a40' >>> bytes.fromhex('48501c514767110c5d2b5c2c7645213a40') b'HP\x1cQGg\x11\x0c]+\\,vE!:@' >>> (0x48501c514767110c5d2b5c2c7645213a40 == 0b100100001010000000111000101000101000111011001110001000100001100010111010010101101011100001011000111011001000101001000010011101001000000) True >>>

In the 135-bit line above, the first 16 groups of 8 bits from the side with the direct byte order encode each character (including non-printable), while the last group of 7 bits leads to the @ character, as seen below:

Technical breakdown of the format of the above 135-bit string

And here is a breakdown of a 135-bit string as text:

 10010000 = H (72) 10100000 = P (80) 00111000 = x1c (28 for File Separator) * 10100010 = Q (81) 10001110 = G(71) 11001110 = g (103) 00100010 = x11 (17 for Device Control 1) * 00011000 = x0c (12 for NP form feed, new page) * 10111010 = ] (93 for right bracket '] 01010110 = + (43 for + sign) 10111000 = \ (92 for backslash) 01011000 = , (44 for comma, ',) 11101100 = v (118) 10001010 = E (69) 01000010 = ! (33 for exclamation) 01110100 = : (58 for colon ':) 1000000 = @ (64 for '@ sign)

So, in conclusion, the answer to the sub-question about displaying non-printable in hexadecimal form in the byte array located above contains the letters x1c , which indicate the file separator command, which was also noted in the tooltip. An array of bytes can be considered a string, if you exclude the prefix b on the left side, and again this value is displayed in the print line, although it is invisible (although its presence can be observed, as shown above, with a hint and len command).

0

Steven hatzakis Aug 22 '19 at 20:45

source share

Martijn pieters · Accepted Answer · 2012-12-18T07:10:16+0000

You will need to do the translation manually; for example, skip the regex line and replace each occurrence with the hexadecimal equivalent.

 import re replchars = re.compile(r'[\n\r]') def replchars_to_hex(match): return r'\x{0:02x}'.format(ord(match.group())) replchars.sub(replchars_to_hex, inputtext)

The above example only matches newlines and carriage returns, but you can expand which characters match, including using \x escape codes and ranges.

 >>> inputtext = 'Some example containing a newline.\nRight there.\n' >>> replchars.sub(replchars_to_hex, inputtext) 'Some example containing a newline.\\x0aRight there.\\x0a' >>> print(replchars.sub(replchars_to_hex, inputtext)) Some example containing a newline.\x0aRight there.\x0a

Show non-printable characters in a string

More articles: