Memory error when applying .loc filter in Dataframe

Question

Memory error when applying .loc filter in Dataframe

I have a large framework with approximately 392 million rows and 9 columns. I want to apply a filter in a dataset to retrieve a subset.

Here is my source dataset dh_activity_recos

dh_activity_approved = dh_activity_recos.loc[dh_activity_recos.approved_flag == 1]

Now, when I apply this filter, I get the following memory error:

Traceback (most recent call last):
  File "/mnt01/eh-datasci/ravinder/working/final_recos_processing.py", line 144, in <module>
    dh_activity_approved = dh_activity_recos.loc[dh_activity_recos.approved_flag == 1]
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1227, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1344, in _getitem_axis
    return self._getbool_axis(key, axis=axis)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1239, in _getbool_axis
    raise self._exception(detail)
KeyError: MemoryError()

I can not understand what is the reason. I checked with the command dir(); there are no other memory resources besides this large dataset. Moreover, I am doing this on a cloud with 128 GB of RAM, so I'm not sure why this error pops up.

+4

python python-2.7 memory large-data

user2906657 May 27 '16 at 10:28

source share

No one has answered this question yet.

See related questions:

2

Python Pandas prescribes dataframe (doesn't work)

2

500 Mistakes in Dreamhost Running Django Under Passenger

1

Pandas pivot_table with aggfunc works differently on slightly different data