I load two data sets from two different databases that need to be combined. Each of them individually is about 500 MB when I store them as CSV. It fits into the memory separately, but when I boot, I sometimes get a memory error. I am definitely having problems when I try to combine them with pandas.
What is the best way to make an external connection on them so that I don't get a memory error? I do not have database servers, but I can install any open source software on my computer if that helps. Ideally, I would still like to allow it only in pandas, but I'm not sure if this is possible at all.
To clarify: by merging, I mean an outer join. Each table has two rows: product and version. I want to check which products and versions are only in the left table, only for the right table and both tables. What am i doing with
pd.merge(df1,df2,left_on=['product','version'],right_on=['product','version'], how='outer')
source
share