Python3.title () from utf-8 strings

Question

Python3.title () from utf-8 strings

So I have a line:

amélie

In bytes it is b'ame\xcc\x81lie'

In utf-8, the character combines a sharp accent for the previous character http://www.fileformat.info/info/unicode/char/0301/index.htm

u'ame\u0301lie'

When I do: 'amélie'.title () on this line, I get "AméLie", which makes no sense to me.

I know I can do a workaround, but is this intentional behavior or a mistake? I would expect l to NOT work.

another experiment:

  In [1]: [ord(c) for c in 'amélie'.title()]
  Out[1]: [65, 109, 101, 769, 76, 105, 101]

  In [2]: [ord(c) for c in 'amélie']
  Out[2]: [97, 109, 101, 769, 108, 105, 101]

+4

python python-3.x

lqdc Sep 2 '15 at 4:38

source share

1 answer

maxymoo · Accepted Answer · 2015-09-02T06:03:58+0000

Take a look at the following questions: Python title () with apostrophes and Heading in which the line with exceptions is cut

title, , .

string.capwords:

import string
string.capwords('amélie')
Out[18]: 'Amélie'

, , é ('\xc3\xa9'), e :

b'am\xc3\xa9lie'.decode().title()
Out[21]: 'Amélie'

Python3.title () from utf-8 strings

More articles: