When should I use a Unicode string?

I reflect when to use the Unicode string in Python 2.7 and my django applications.

Is it good to use u'some string' convention for each string ?

For instance:

 // models.py # -*- coding: UTF-8 -*- class ModelClass(models.Model) field_name = models.ForeignKey(SomeModel, related_name=u'some_models') # ... class Meta: ordering = (u'created', u'name',) 

and

 // urls.py # -*- coding: UTF-8 -*- urlpatterns = patterns(u'', url(r'^a/$', views.some_view(), name=u'a'), url(r'^b/(?P<pk>[0-9]+)/$', views.some_view2(), name=u'b'), ) 

?

+4
source share
2 answers

IMO, you should use Unicode wherever you have text. You never know if Jürgen, Søren or Joël will present them to Üuvre in the context of your application.

When you have data that needs to be transferred to another process or file, you should have it as a regular string (Py2), respectively. bytes() object (Py3). To identify the interface between these areas, you have to be a little careful.

+4
source

You can use Unicode encoding everywhere in your application. However, you can pay attention when it comes to In / Out.

One problem is the multibyte character encodings; one Unicode character can be represented by several bytes. If you want to read a file in arbitrary sizes (say, 1K or 4K), you need to write an error handling code to catch the case when only a part of the bytes encoding a single Unicode character is read at the end of the fragment. One solution would be to read the entire file into memory and then perform decoding, but this prevents you from working with files that are extremely large; if you need to read 2Gb file, you need 2Gb RAM. (Moreover, indeed, since at least you will need both the encoded string and its Unicode version in memory.)

HOWTO most important tip

The most important tip:

The software should only work with Unicode strings inside, converting to a specific encoding on the output.

+2
source

All Articles