Put 2d Array in Pandas Series

I have a 2D Numpy array that I would like to add to the pandas series (and not to the DataFrame):

>>> import pandas as pd >>> import numpy as np >>> a = np.zeros((5, 2)) >>> a array([[ 0., 0.], [ 0., 0.], [ 0., 0.], [ 0., 0.], [ 0., 0.]]) 

But this causes an error:

 >>> s = pd.Series(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__ raise_cast_failure=True) File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array raise Exception('Data must be 1-dimensional') Exception: Data must be 1-dimensional 

This is possible with hacking:

 >>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0]) >>> s 0 [0.0, 0.0] 1 [0.0, 0.0] 2 [0.0, 0.0] 3 [0.0, 0.0] 4 [0.0, 0.0] 

Is there a better way?

+5
source share
2 answers

Well, you can use the numpy.ndarray.tolist function, for example:

 >>> a = np.zeros((5,2)) >>> a array([[ 0., 0.], [ 0., 0.], [ 0., 0.], [ 0., 0.], [ 0., 0.]]) >>> a.tolist() [[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]] >>> pd.Series(a.tolist()) 0 [0.0, 0.0] 1 [0.0, 0.0] 2 [0.0, 0.0] 3 [0.0, 0.0] 4 [0.0, 0.0] dtype: object 

EDIT:

A faster way to achieve a similar result is to simply do pd.Series(list(a)) . This will make a series of numpy arrays instead of Python lists, so it should be faster than a.tolist , which returns a list of Python lists.

+4
source
  pd.Series(list(a)) 

consistently slower than

 pd.Series(a.tolist()) 

20,000,000 - 500,000 rows checked

 a = np.ones((500000,2)) 

shows only 1,000,000 lines:

 %timeit pd.Series(list(a)) 1 loop, best of 3: 301 ms per loop %timeit pd.Series(a.tolist()) 1 loop, best of 3: 261 ms per loop 
+1
source

All Articles