I have three UTF-8 bites:
hello, world hello, 世界 hello, 世rld
I need only the first 10 ascii-char -width so that the bracket in one column:
[hello, wor] [hello, 世 ] [hello, 世r]
In the console:
width('世界')==width('worl') width('世 ')==width('wor')
One Chinese char is three bytes, but when displayed in the console, it has only 2 ascii widths:
>>> bytes("hello, 世界", encoding='utf-8') b'hello, \xe4\xb8\x96\xe7\x95\x8c'
python format() doesn't help when UTF-8 characters mix in
>>> for s in ['[{0:<{1}.{1}}]'.format(s, 10) for s in ['hello, world', 'hello, 世界', 'hello, 世rld']]: ... print(s) ... [hello, wor] [hello, 世界 ] [hello, 世rl]
It's not beautiful:
-----------Songs----------- | 1: 蝴蝶 | | 2: 心之城 | | 3: 支持你的爱人 | | 4: 根生的种子 | | 5: 鸽子歌(CUCURRUCUCU PALO| | 6: 林地之间 | | 7: 蓝光 | | 8: 在你眼里 | | 9: 肖邦离别曲 | | 10: 西行( 魔戒王者再临主题曲)(INTO | | X 11: 深陷爱河 | | X 12: 钟爱大地(THE MO RUN AIR | | X 13: 时光流逝 | | X 14: 卡农 | | X 15: 舒伯特小夜曲(SERENADE) | | X 16: 甜蜜的摇篮曲(Sweet Lullaby| ---------------------------
So, I wonder if there is a standard way to populate UDF-8?
python unicode string-formatting
kev
source share