Clock frequency with Python

Question

Clock frequency with Python

I have Hourly csv data sorted like this day after day for hundreds of days:

2011.05.16,00:00,1.40893 2011.05.16,01:00,1.40760 2011.05.16,02:00,1.40750 2011.05.16,03:00,1.40649

I want to calculate how many times per hour the maximum value of the day has been set, so if at 00:00 I had the maximum value of 05/05/16, I add 1 to 00:00 and so on. For this, I used a loop to count the hours, like indices, like this:

def graph():    
Date, Time,  High = np.genfromtxt(myPath, delimiter=",",
                                  unpack = True,  converters={0:date_converter})                                                                           
numList = [""] * 24
index=0
hour=0    
count = [0] * 24

for eachHour in Time:        
    numList[hour] += str(High[index])        
    index += 1
    hour +=1        

    if hour == 24:           
        higher = (numList.index(max(numList)))
        count[higher] += 1            
        hour = 0            
        numList = [""] * 24

, , . , , "" , , , . , , ? :

00:00 n time max of the day   
01:00 n time max of the day   
02:00 n time max of the day  
etc

+4

python pandas time

pietrovismara 28 . '13 21:41

3

pandas. , - , , , :

In [32]: df['daily_max'] = df.groupby(df.index.date).transform(lambda x: x==x.max())
In [33]: df
Out[33]: 
                       value daily_max
date_time                             
2011-05-16 00:00:00  1.40893      True
2011-05-16 01:00:00  1.40760     False
2011-05-16 02:00:00  1.40750     False
2011-05-16 03:00:00  1.40649     False
2011-05-17 02:00:00  1.40893      True
2011-05-17 03:00:00  1.40760     False
2011-05-17 04:00:00  1.40750     False
2011-05-17 05:00:00  1.40649     False
2011-05-18 02:00:00  1.40893      True
2011-05-18 03:00:00  1.40760     False
2011-05-18 04:00:00  1.40750     False
2011-05-18 05:00:00  1.40649     False

In [34]: df.groupby(df.index.time)['daily_max'].sum()
Out[34]: 
00:00:00    1
01:00:00    0
02:00:00    2
03:00:00    0
04:00:00    0
05:00:00    0
Name: daily_max, dtype: float64

pandas , ( df DatetimeIndex):

df['date'] = [t.date() for t in df.index.to_pydatetime()]
df['time'] = [t.time() for t in df.index.to_pydatetime()]
df['daily_max'] = df.groupby('date')['value'].transform(lambda x: x==x.max())
df.groupby('time')['daily_max'].sum()

, :

from StringIO import StringIO

s="""2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649
2011.05.17,02:00,1.40893
2011.05.17,03:00,1.40760
2011.05.17,04:00,1.40750
2011.05.17,05:00,1.40649
2011.05.18,02:00,1.40893
2011.05.18,03:00,1.40760
2011.05.18,04:00,1.40750
2011.05.18,05:00,1.40649"""

df = pd.read_csv(StringIO(s), header=None, names=['date', 'time', 'value'], parse_dates=[['date', 'time']])
df = df.set_index('date_time')

+3

joris 28 . '13 22:40

, ,

from time import strptime,strftime

time_format="%H:%M"
date_format="%Y.%m.%d"

def date_values(flo):
    for line in flo:
        try:
            date_str, time_str, value = line.split(',')
            date = strptime(date_str,"%Y.%m.%d")
            time = strptime(time_str,"%H:%M")
            value = float(value)
            yield (date, time, value)
        except ValueError:
            pass

def day_values(flo):
    days = {}
    for date,time,value in date_values(flo):
        try:
            days[date].append(value)
        except KeyError:
            days[date] = [ value ]

    return days

if __name__ == '__main__':
    from sys import stdin

    for day,values in day_values(stdin).items():
        print("{0}: {1} (max of {2})".format(
              strftime(date_format, day),
              values, 
              max(values)))

date_values , . day_values , , - . , , , , .

If I name this file freq_count.pyand suppose your data is installed in a file with the name data, I get

$ python freq_count.py < data
2011.05.16: [1.40893, 1.4076, 1.4075, 1.40649] (max of 1.40893)

To calculate the frequency of the maximum value:

def count_freq(values):
    return len( [ v for v in values if v == max(values) ] )

which uses a list view to create a list containing all the values that are max in the input values, then take the length of the resulting list.

0

HazyBlueDot Dec 28 '13 at 22:18

source share

Andy Hayden · Accepted Answer · 2013-12-28T22:19:15+0000

csv:

In [11]: df = pd.read_csv('foo.csv', sep=',', header=None, parse_dates=[[0, 1]])

In [12]: df.columns = ['date', 'val']

In [13]: df.set_index('date', inplace=True)

In [14]: df
Out[14]: 
                         val
date                        
2011-05-16 00:00:00  1.40893
2011-05-16 01:00:00  1.40760
2011-05-16 02:00:00  1.40750
2011-05-16 03:00:00  1.40649

resample, :

In [15]: day_max = df.resample('D', how='max')

, max:

In [16]: df['is_day_max'] = day_max.lookup(df.index.normalize(), len(df) * ['val']) == df.val

In [17]: df
Out[17]: 
                         val is_day_max
date                                   
2011-05-16 00:00:00  1.40893       True
2011-05-16 01:00:00  1.40760      False
2011-05-16 02:00:00  1.40750      False
2011-05-16 03:00:00  1.40649      False

:

In [18]: df.groupby(df.index.time)['is_day_max'].sum()
Out[18]: 
00:00:00    1
01:00:00    0
02:00:00    0
03:00:00    0
Name: is_day_max, dtype: float64

Clock frequency with Python

More articles: