Pandas DataFrame with MultiIndex to Numpy Matrix

I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get a Numpy matrix with something like df.as_matrix(...) , but this matrix has the form (n_rows, 1) . I want a form matrix (n_index1_rows, n_index2_rows, 1) .

Is there a way to use .groupby(...) , then .values.tolist() or .as_matrix(...) to get the shape you want?

EDIT : data

  value current_date temp_date 1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489 30.497100 1970-01-01 00:00:01.446237494 9.584300 1970-01-01 00:00:01.446237455 10.134200 1970-01-01 00:00:01.446237494 7.803683 1970-01-01 00:00:01.446237400 10.678700 1970-01-01 00:00:01.446237373 9.700000 1970-01-01 00:00:01.446237180 15.000000 1970-01-01 00:00:01.446236961 12.928866 1970-01-01 00:00:01.446237032 10.458800 

This is a kind of idea:

 np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")]) 
+6
source share
1 answer

I think you want to unlock a multi-index, for example.

 df.unstack().values[:, :, np.newaxis] 

Edit: if you have duplicate indexes, then unpacking will not work, and you most likely need pivot_table :

 pivoted = df.reset_index().pivot_table(index='current_date', columns='temp_date', aggfunc='mean') arr = pivoted.values[:, :, np.newaxis] arr.shape # (10, 50, 1) 

Here is a complete unstack example. First we will create some data:

 current = pd.date_range('2015', periods=10, freq='D') temp = pd.date_range('2015', periods=50, freq='D') ind = pd.MultiIndex.from_product([current, temp], names=['current_date', 'temp_date']) df = pd.DataFrame({'val':np.random.rand(len(ind))}, index=ind) df.head() # val # current_date temp_date # 2015-01-01 2015-01-01 0.309488 # 2015-01-02 0.697876 # 2015-01-03 0.621318 # 2015-01-04 0.308298 # 2015-01-05 0.936828 

Now we will unlock the multi-index: we will show the first piece of 4x4 data:

 df.unstack().iloc[:4, :4] # val # temp_date 2015-01-01 2015-01-02 2015-01-03 2015-01-04 # current_date # 2015-01-01 0.309488 0.697876 0.621318 0.308298 # 2015-01-02 0.323530 0.751486 0.507087 0.995565 # 2015-01-03 0.805709 0.101129 0.358664 0.501209 # 2015-01-04 0.360644 0.941200 0.727570 0.884314 

Now extract the numpy array and change the form to [nrows x ncols x 1], as you pointed out in the question:

 vals = df.unstack().values[:, :, np.newaxis] print(vals.shape) # (10, 50, 1) 
+5
source

All Articles