What is this Python array? Does it already exist in Python?

Question

What is this Python array? Does it already exist in Python?

I have a numpy array:

m = array([[4, 0, 9, 0], [0, 7, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5]])

4 columns m are labeled as follows:

 c = array([ 10, 20, 30, 40])

I want to be able to trim object o so that:

 o.vals[0,:] = array([4, 9]) o.vals[1,:] = array([7,]) o.vals[2,:] = array([]) o.vals[3,:] = array([5]) o.cols[0,:] = array([10, 30] )# the non-zero column labels from row 0 o.cols[1,:] = array([20,]) o.cols[2,:] = array([]) o.cols[3,:] = array([40])

Is there an existing Python object that would allow me to do this?

I looked at Scipy Sparse Matrices , but this is not quite what I am looking for.

AN UPDATE August 17, 2015: I had a game with some ideas and she came up with this, which is almost the same as last week:

+7

python arrays numpy scipy sparse-matrix

Ginger Aug 11 '15 at 7:45

source share

4 answers

xnx · Answer 1 · 2015-08-11T08:15:15+0000

You can get closer to what you want by specifying a class containing m and c :

 import numpy as np class O(object): def __init__(self, m, c): self.m, self.c = m, c def vals(self, i): return self.m[i][self.m[i]!=0] def cols(self, i): return self.c[self.m[i]!=0] m = np.array([[4, 0, 9, 0], [0, 7, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5]]) c = np.array([ 10, 20, 30, 40]) o = O(m, c) for i in range(4): print 'o.vals({0:d}) = {1}'.format(i, o.vals(i)) for i in range(4): print 'o.cols({0:d}) = {1}'.format(i, o.cols(i))

Return:

 o.vals(0) = [4 9] o.vals(1) = [7] o.vals(2) = [] o.vals(3) = [5] o.cols(0) = [10 30] o.cols(1) = [20] o.cols(2) = [] o.cols(3) = [40]

(It might be easier to use indexing m[i][m[i]!=0 and c[m[i]!=0] .)

chris-sc · Answer 2 · 2015-08-11T08:20:21+0000

You can use pandas ( http://pandas.pydata.org/ ). (since you tried scipy/numpy , which are not standard Python library packages, I guess you can suggest a different package).

A DataFrame is an object that allows you to perform all your operations and much more.

 import numpy as np import pandas as pd m = array([[4, 0, 9, 0], [0, 7, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5]]) # create a dataframe df = pd.DataFrame(m, columns=[10,20,30,40]) # replace 0 with NaN (to make use of pandas `dropna`) df.replace(0, np.NaN, inplace=True) # values per row df.irow(0).dropna().as_matrix() array([ 4., 9.]) df.irow(1).dropna().as_matrix() array([ 7.]) df2.irow(2).dropna().as_matrix() array([], dtype=float64) # column labels (as list) df.irow(1).dropna().index.tolist() [10, 30] # or non-zero values per column? df.icol(0).dropna().as_matrix() array([ 4.]) # ...

You can also combine the label and column value, since the normal return from dropna is a DataFrame.

 non_zero_1 = df.irow(0).dropna() labels_1 = non_zero_1.index Int64Index([10, 30], dtype='int64')

Try Pandas best and see if it suits your needs. And also take a look at the great introduction ( http://pandas.pydata.org/pandas-docs/stable/10min.html ).

Jaime · Answer 3 · 2015-08-11T09:28:01+0000

You can get closer to what you want with a sparse CSR matrix:

 import scipy.sparse as sps m_csr = sps.csr_matrix(m)

Now you can implement functions similar to what you after this:

 def vals(sps_mat, row): row_slice = slice(sps_mat.indptr[row], sps_mat.indptr[row+1]) return sps_mat.data[row_slice] def cols(sps_mat, col_labels, row): col_labels = np.asarray(col_labels) row_slice = slice(sps_mat.indptr[row], sps_mat.indptr[row+1]) return col_labels[sps_mat.indices[row_slice]]

Using these functions, we get:

 >>> for row in range(m_csr.shape[0]): ... print vals(m_csr, row) ... [4 9] [7] [] [5] >>> for row in range(m_csr.shape[0]): ... print cols(m_csr, [10, 20, 30, 40], row) ... [10 30] [20] [] [40]

This will be very effective for large matrices, although the syntax is not quite what you wanted.

Kasramvd · Answer 4 · 2015-08-11T09:46:33+0000

You can use the nested class and overload the __getitem__ attribute of your objects:

 import numpy as np class indexer: def __init__(self,arr): self.arr=arr self.d=self.caldict(self.arr) self.vals=self.values(self.arr,self.d) self.cols=self.columns(self.d) def caldict(self,arr,dd={}): inds=np.array(np.nonzero(arr)).T for i,j in inds: dd.setdefault(i,[]).append(j) return dd class values: def __init__(self,arr,d): self.arr=arr self.d=d def __getitem__(self,index): try: return self.arr.take(index,axis=0)[self.d[index]] except KeyError: return [] class columns: def __init__(self,d): self.d=d self.c=np.array([ 10, 20, 30, 40]) def __getitem__(self,index): try: return self.c.take(self.d[index]) except KeyError: return []

Demo:

 m=np.array([[4, 0, 9, 0], [0, 7, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5]]) o=indexer(m) print o.vals[0],'\n',o.vals[1],'\n',o.vals[2],'\n',o.vals[3] print '------------------' print o.cols[0],'\n',o.cols[1],'\n',o.cols[2],'\n',o.cols[3] [4 9] [7] [] [5] ------------------ [10 30] [20] [] [40]

What is this Python array? Does it already exist in Python?

More articles: