It looks like a job to understand the list . Repeating your example, I did this in MATLAB:
cell_of_strings = {'thank', 'you', 'very', 'much'}; save('my.mat', 'cell_of_strings','-v7');
I am using a newer version of MATLAB, which saves .mat files in HDF5 format by default. loadmat cannot read HDF5 files, so the '-v7' flag should force MATLAB to save the old .mat , which loadmat can understand.
In Python, I loaded an array of cells just like you:
import scipy.io as sio matdata = sio.loadmat('%s/my.mat' %path, chars_as_strings=1, matlab_compatible=1); array_of_strings = matdata['cell_of_strings']
Printing array_of_strings gives:
[[array([[u't', u'h', u'a', u'n', u'k']], dtype='<U1') array([[u'y', u'o', u'u']], dtype='<U1') array([[u'v', u'e', u'r', u'y']], dtype='<U1') array([[u'm', u'u', u'c', u'h']], dtype='<U1')]]
The variable array_of_strings is an array of (1,4) numpy objects, but each object has arrays. For example, the first element of array_of_strings is an array (1,5) containing the letters for "thanks." I.e
array_of_strings[0,0] array([[u't', u'h', u'a', u'n', u'k']], dtype='<U1')
To get to the first letter "t", you need to do something like:
array_of_strings[0,0][0,0] u't'
Since we are dealing with nested arrays, we need to use some recursive technique to extract the data, i.e. nested for loops. But first, I'll show you how to extract the first word:
first_word = [str(''.join(letter)) for letter in array_of_strings[0][0]] first_word ['thank']
Here I use list comprehension. Basically, I iterate over each letter in array_of_strings [0] [0] and concatenate them using the ''.join . The string() function is to convert unicode strings to regular strings.
Now, to get the desired list line, we just need to skip each array of letters:
words = [str(''.join(letter)) for letter_array in array_of_strings[0] for letter in letter_array] words ['thank', 'you', 'very', 'much']
Understanding the lists takes some getting used to, but they are extremely helpful. Hope this helps.