The numpy string descriptors are not python strings.
Consequently, pandas intentionally uses its own python strings, which require a dtype object.
First of all, let me demonstrate a little what I mean when numpy strings are different:
In [1]: import numpy as np In [2]: x = np.array(['Testing', 'a', 'string'], dtype='|S7') In [3]: y = np.array(['Testing', 'a', 'string'], dtype=object)
Now 'x' is a dtype numpy string (fixed width, c-like string) and y is an array of python's own strings.
If we try to go beyond 7 characters, we will see an immediate difference. String versions of dtype will be truncated:
In [4]: x[1] = 'a really really really long' In [5]: x Out[5]: array(['Testing', 'a reall', 'string'], dtype='|S7')
Whereas dtype versions of an object can be of arbitrary length:
In [6]: y[1] = 'a really really really long' In [7]: y Out[7]: array(['Testing', 'a really really really long', 'string'], dtype=object)
Further, |S dtype strings cannot properly contain Unicode, although there is also a dtype string with a fixed unicode length. For now, I will skip an example.
Finally, numpy strings are actually mutable, but Python strings are not. For example:
In [8]: z = x.view(np.uint8) In [9]: z += 1 In [10]: x Out[10]: array(['Uftujoh', 'b!sfbmm', 'tusjoh\x01'], dtype='|S7')
For all these reasons, pandas chose never to allow strings of type C with a fixed length as a data type. As you noticed, trying to force a python string to a fixed string with numpy will not work in pandas . Instead, it always uses its own python strings, which behave in a more intuitive way for most users.