I run a basic script that moves through a nested dictionary, captures data from each record and adds it to the Pandas DataFrame. The data looks something like this:
data = {"SomeCity": {"Date1": {record1, record2, record3, ...}, "Date2": {}, ...}, ...}
In total, he has several million records. The script itself is as follows:
city = ["SomeCity"] df = DataFrame({}, columns=['Date', 'HouseID', 'Price']) for city in cities: for dateRun in data[city]: for record in data[city][dateRun]: recSeries = Series([record['Timestamp'], record['Id'], record['Price']], index = ['Date', 'HouseID', 'Price']) FredDF = FredDF.append(recSeries, ignore_index=True)
This is very slow, however. Before looking for a way to parallelize, I just want to make sure that I am missing something obvious, to make it work faster like this, since I'm still quite new to Pandas.
source share