Pandas dataframe without copy

How can I avoid using the copy of the dictionary provided when creating the Pandas DataFrame?

>>> a = np.arange(10) >>> b = np.arange(10.0) >>> df1 = pd.DataFrame(a) >>> a[0] = 100 >>> df1 0 0 100 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 >>> d = {'a':a, 'b':b} >>> df2 = pd.DataFrame(d) >>> a[1] = 200 >>> d {'a': array([100, 200, 2, 3, 4, 5, 6, 7, 8, 9]), 'b': array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])} >>> df2 ab 0 100 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 

If I create a dataframe only from a, then changes to are reflected in df (and vice versa).

Is there any way to make this work when supplying a dictionary?

+4
source share
2 answers

Unable to β€œsplit” dict and update frame based on dict changes. The copy argument is not related to the dict, the data is always copied because it is converted to ndarray.

However, there is a way to get this type of dynamic behavior in a limited way.

 In [9]: arr = np.array(np.random.rand(5,2)) In [10]: df = DataFrame(arr) In [11]: arr[0,0] = 0 In [12]: df Out[12]: 0 1 0 0.000000 0.192056 1 0.847185 0.609028 2 0.833997 0.422521 3 0.937638 0.711856 4 0.047569 0.033282 

Thus, the past ndarray will, at build time, be a representation of the numpy base array. Depending on how you work with the DataFrame, you can invoke a copy (for example, if you assign to say a new column or change the dtype columns). This will also work for only one dtyped frame.

+3
source

You can initialize a data block without copying data. To understand how to understand this, you need to understand the BlockManager, which is the database used by the DataFrame. It tries to group data of the same type together and store its memory in one block. If the data is already presented as one block, for example, you are initialized from the matrix:

  a = np.zeros((100,20)) a.flags['WRITEABLE'] = False df = pd.DataFrame(a, copy=False) assert_read_only(df[df.columns[0]].iloc) 

... then a DataFrame can usually just reference ndarray.

Obviously this will not work if you start with multiple arrays or have heterogeneous types. In this case, you can decapitate the BlockManager patch so that it does not consolidate differently typed data.

0
source

All Articles