Python C Unicode Arguments

Question

Python C Unicode Arguments

I have a simple python script

import _tph str = u', <b>!</b>' # Some unicode string with a russian characters _tph.strip_tags(str)

and the C library, which is compiled in _tph.so. This is the strip_tags function from it:

 PyObject *strip_tags(PyObject *self, PyObject *args) { PyUnicodeObject *string; Py_ssize_t length; PyArg_ParseTuple(args, "u#", &string, &length); printf("%d, %d\n", string->length, length); // ... }

Function

printf prints this: 1080, 19. So the length of str is really 19 characters, but what the hell do I get these 1080 characters from?

When I type string , I got my str , null char, and then a lot of junk.

Unwanted memory looks like this:

u '\ u041f \ u0440 \ u0438 \ u0432 \ u0435 \ u0442, <b> \ u043c \ u0438 \ u0440! </b> \ x00 \ x00 \ u0299 \ Ub7024000 \ U08c55800 \ Ub7025904 \ x00 \ Ub777351c \ U08c79e58 \ x00 \ U08c7a0b4 \ x00 \ Ub7025904 \ Ub7025954 \ Ub702594c \ Ub702595c \ U00702594c \ U0070259492 \ U0070259292 \ U0070259292 \ U0070259292 \ U0070259492 \ U0070259492 \ U0070259492 \ U0070259492 \ U0070259492 \ U0070259492 \

How can I get a normal line here?

+4

c python unicode python-c-api

SvartalF Oct 31 '11 at 14:41

source share

1 answer

Raymond hettinger · Accepted Answer · 2011-10-31T14:53:23+0000

The string argument has no name here. This is a pointer to a Python Unicode object, so your printf sees a lot of binary data (object type, GC headers, number of links, and Unicode encoded codes) until it searches for a null byte that interprets printf as the end of a line.

The easiest way to view a string is PyObject_Print(string) . You can find C functions for managing Python Unicode objects at: http://docs.python.org/c-api/unicode.html#unicode-objects

Python C Unicode Arguments

More articles: