Creating a huge scipy array

I want to create a scipy array from a really huge list. But unfortunately, I came across a problem.

I have a list of xs lines. Each line has a length of 1.

>>> type(xs)
<type 'list'>
>>> len(xs)
4001844816

If I convert only the first 10 elements, everything still works as expected.

>>> s = xs[0:10]
>>> x = scipy.array(s)
>>> x
array(['A', 'B', 'C', 'D', 'E', 'F', 'O', 'O'],
      dtype='|S1‘)
>>> len(x)
10

For the entire list, I get this result:

>>> ary = scipy.array(xs)
>>> ary.size
1
>>> ary.shape
()
>>> ary[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: 0-d arrays can't be indexed
>>>ary[()]
...The long list

Workaround:

test = scipy.zeros(len(xs), dtype=(str, 1))
for i in xrange(len(xs)):
    test[i] = xs[i]

This is not a problem of insufficient memory. So far I am using a workaround (which takes 15 minutes). But I would like to understand the problem.

thanks

- Edit: Note on the workaround test[:] = xswill not work. (Also with error 0-d IndexError)

On my macbook 2147483648 was the smallest size causing the problem. I defined it using this little script:

#!/usr/bin/python
import scipy as sp

startlen = 2147844816

xs = ["A"] * startlen
ary = sp.array(xs)
while ary.shape == ():
    print "bad", len(xs)
    xs.pop()
    ary = sp.array(xs)

print "good", len(xs)
print ary.shape, ary[0:10]
print "DONE."

It was a way out.

...
bad 2147483649
bad 2147483648
good 2147483647
(2147483647,) ['A' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'A']
DONE.

Python version

>>> sys.version
'2.7.5 (default, Aug 25 2013, 00:04:04) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]'
>>> scipy.version.version
'0.11.0'
+4
1

, 64- /Python/Numpy, , . - 4 , 4 numpy. x64 . memmap?

, , memmap, , ( ) ( IO). , 30 S1. , memmap . . 15- memmap.

baseNumber = 3000000L
#dataType = 'float64'#
numBytes = 1
dataType = 'S1'
for powers in arange(1,7):
  l1 = baseNumber*10**powers
  print('working with %d elements'%(l1))
  print('number bytes required %f GB'%(l1*numBytes/1e9))
  try:
    fp = numpy.memmap('testa.map',dtype=dataType, mode='w+',shape=(1,l1))
    #works 
    print('works')
    del fp
  except Exception as e:
    print(repr(e))


"""
working with 30000000 elements
number bytes required 0.030000 GB
works
working with 300000000 elements
number bytes required 0.300000 GB
works
working with 3000000000 elements
number bytes required 3.000000 GB
works
working with 30000000000 elements
number bytes required 30.000000 GB
works
working with 300000000000 elements
number bytes required 300.000000 GB
IOError(28, 'No space left on device')
working with 3000000000000 elements
number bytes required 3000.000000 GB
IOError(28, 'No space left on device')


"""
+1

All Articles