How to reset boolean matrix in numpy?

I have a graph represented as a numeric boolean array ( G.adj.dtype == bool ). This is homework when writing my own graph library, so I cannot use networkx. I want to dump it into a file so that I can play with it, but for life I can not decide how to make a numpy dump in a recoverable form.

I tried G.adj.tofile , which correctly wrote the graph (ish) as one long line True / False. But fromfile barfs when reading this, giving an array of 1x1, and loadtxt raises a ValueError: invalid literal for int . np.savetxt works, but saves the matrix as a floating list 0/1, and loadtxt(..., dtype=bool ) fails with the same ValueError.

Finally, I tried networkx.from_numpy_matrix with networkx.write_dot , but this gave each edge [weight=True] in the point source that networkx.read_dot broke.

+6
python numpy matrix
source share
4 answers

To save:

 numpy.savetxt('arr.txt', G.adj, fmt='%s') 

To restore:

 G.adj = numpy.genfromtxt('arr.txt', dtype=bool) 

NTN!

+4
source share

This is my test case:

 m = numpy.random(100,100) > 0.5 

space efficiency

numpy.savetxt('arr.txt', obj, fmt='%s') creates a 54 KB file.

numpy.savetxt('arr.txt', obj, fmt='%d') creates a much smaller file (20 kB).

cPickle.dump(obj, open('arr.dump', 'w')) , which creates a 40 KB file,

time efficiency

numpy.savetxt('arr.txt', obj, fmt='%s') 45 ms

numpy.savetxt('arr.txt', obj, fmt='%d') 10 ms

cPickle.dump(obj, open('arr.dump', 'w')) , 2.3 ms

Conclusion

use savetxt with text format ( %s ) if human readability is required, use savetxt with number format ( %d ) if a space problem is considered, and use cPickle if time is a problem.

+4
source share

The easiest way to save your array, including metadata (dtype, dimensions), is to use numpy.save() and numpy.load() :

 a = array([[False, True, False], [ True, False, True], [False, True, False], [ True, False, True], [False, True, False]], dtype=bool) numpy.save("data.npy", a) numpy.load("data.npy") # array([[False, True, False], # [ True, False, True], # [False, True, False], # [ True, False, True], # [False, True, False]], dtype=bool) 

a.tofile() and numpy.fromfile() will work, but do not save the metadata. You need to pass dtype=bool to fromfile() and get a one-dimensional array, which should be reshape() d to its original form.

+4
source share

I know this question is pretty old, but I want to add Python 3 tests. This is slightly different from the previous one.

First, I load a lot of data into memory, convert it to an int8 numpy array with only 0 and 1 as possible values, and then upload it to the hard drive using two approaches.

 from timer import Timer import numpy import pickle # Load data part of code is omitted. prime = int(sys.argv[1]) np_table = numpy.array(check_table, dtype=numpy.int8) filename = "%d.dump" % prime with Timer() as t: pickle.dump(np_table, open("dumps/pickle_" + filename, 'wb')) print('pickle took %.03f sec.' % (t.interval)) with Timer() as t: numpy.savetxt("dumps/np_" + filename, np_table, fmt='%d') print('savetxt took %.03f sec.' % (t.interval)) 

Time measurement

 It took 50.700 sec to load data number 11 pickle took 0.010 sec. savetxt took 1.930 sec. It took 1297.970 sec to load data number 29 pickle took 0.070 sec. savetxt took 242.590 sec. It took 1583.380 sec to load data number 31 pickle took 0.090 sec. savetxt took 334.740 sec. It took 3855.840 sec to load data number 41 pickle took 0.610 sec. savetxt took 1367.840 sec. It took 4457.170 sec to load data number 43 pickle took 0.780 sec. savetxt took 1654.050 sec. It took 5792.480 sec to load data number 47 pickle took 1.160 sec. savetxt took 2393.680 sec. It took 8101.020 sec to load data number 53 pickle took 1.980 sec. savetxt took 4397.080 sec. 

Size measurement

 630K np_11.dump 79M np_29.dump 110M np_31.dump 442M np_41.dump 561M np_43.dump 875M np_47.dump 1,6G np_53.dump 315K pickle_11.dump 40M pickle_29.dump 55M pickle_31.dump 221M pickle_41.dump 281M pickle_43.dump 438M pickle_47.dump 798M pickle_53.dump 

So, the Python 3 pickle version is much faster than numpy.savetxt and uses half the size of the hard drive.

+1
source share

All Articles