DataFrame from DataFrames with pandas

I have the following DataFrame collecting daily statistics for two dimensions A and B:

AB count 17266.000000 17266.000000 std 0.179003 0.178781 75% 101.102251 101.053214 min 100.700993 100.651956 mean 101.016747 100.964003 max 101.540214 101.491178 50% 100.988465 100.938694 25% 100.885251 100.830048 

The following is a snippet of code that creates it:

 day1 = { 'A': { 'count': 17266.0, 'std': 0.17900265293286116, 'min': 100.70099294189714, 'max': 101.54021448871775, '50%': 100.98846526697825, '25%': 100.88525124427971, '75%': 101.10225131847992, 'mean': 101.01674677794136 }, 'B': { 'count': 17266.0, 'std': 0.17878125983374854, 'min': 100.65195609992342, 'max': 101.49117764674403, '50%': 100.93869409089723, '25%': 100.83004837814667, '75%': 101.05321447650618, 'mean': 100.96400305527138 } } df = pandas.DataFrame.from_dict(day1, orient='index').T 

Data comes directly from the description (). I have several such descriptions (one for each day), and I would like to collect all of them into a single data file that has a date as an index.

The most obvious way to get this is to collect all the daily results into one data frame, then group them by day and run statistics on the result. However, I need an alternative method, because I run a MemoryError with the amount of data being processed.

The end result should look like this:

  AB 2014-12-24 count 15895.000000 15895.000000 mean 99.943618 99.968860 std 0.012468 0.011932 min 99.877695 99.928778 25% 99.934890 99.960445 50% 99.943453 99.968847 75% 99.952340 99.977571 max 99.982930 100.002507 2014-12-25 count 16278.000000 16278.000000 mean 99.937056 99.962203 std 0.012395 0.012661 min 99.884501 99.910567 25% 99.928078 99.953758 50% 99.936754 99.962411 75% 99.945914 99.971473 max 99.981512 100.003770 
+5
source share
1 answer

If you can query {date: describe_df_for_that_day}, you can use pd.concat(dict) .

Starting from your df :

 In [14]: d = {'2014-12-24': df, '2014-12-25': df} In [15]: pd.concat(d) Out[15]: AB 2014-12-24 count 17266.000000 17266.000000 std 0.179003 0.178781 75% 101.102251 101.053214 min 100.700993 100.651956 mean 101.016747 100.964003 max 101.540214 101.491178 50% 100.988465 100.938694 25% 100.885251 100.830048 2014-12-25 count 17266.000000 17266.000000 std 0.179003 0.178781 75% 101.102251 101.053214 min 100.700993 100.651956 mean 101.016747 100.964003 max 101.540214 101.491178 50% 100.988465 100.938694 25% 100.885251 100.830048 

You can, of course, make key real dates instead of strings.

+8
source

Source: https://habr.com/ru/post/1212815/


All Articles