Download the list with the date value in pandas data and the site operation over time

Question

Download the list with the date value in pandas data and the site operation over time

I have some data on Twitter that I would like to describe overtime work based on the type of tweets (tweet / mention / retweet).

Data is currently being loaded into a tuple list containing date and type :

time = [('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet')]

I loaded the data into a pandas DataFrame:

time_df = pd.DataFrame(time, columns=['date','time'])

Now that the data looks like this:

         date     time
0  2014-04-13    tweet
1  2014-04-13    tweet
2  2014-04-13  mention
3  2014-04-13  retweet
4  2014-04-13  mention
...
...
...

However, now I'm lost when it comes to building this data over time. In addition, I would like to split each type (tweet / mention / retweet) as a different color line. It should also be noted that sometimes I may need to aggregate data by day / week / month.

, , , Tweet, Mention, Retweet:

+4

python pandas time-series

green_bean_4_u 31 . '14 0:45

1

Paul H · Answer 1 · 2014-07-31T02:01:56+0000

, , , , .

:

import numpy as np
import pandas
import random

tweet_types = ['tweet', 'retweet', 'mention']
index = pandas.DatetimeIndex(freq='5min', start='2014-04-13', end='2014-05-13')
tweets = [random.choice(tweet_types) for _ in range(len(index))]
time_df = pandas.DataFrame(index=index, data=tweets, columns=['tweet type'])
time_df['day'] = time_df.index.date
time_df['count'] = 1
print(time_df.head())

, :

                     tweet type         day  count
2014-04-13 00:00:00     mention  2014-04-13      1
2014-04-13 00:05:00     mention  2014-04-13      1
2014-04-13 00:10:00       tweet  2014-04-13      1
2014-04-13 00:15:00       tweet  2014-04-13      1
2014-04-13 00:20:00     retweet  2014-04-13      1

count, - , :

daily_counts = time_df.groupby(by=['tweet type', 'day']).count()
daily_counts_xtab = daily_counts.unstack(level='tweet type')['count']
print(daily_counts_xtab.head())

...

tweet type  mention  retweet  tweet
day                                
2014-04-13       89      101     98
2014-04-14       98      113     77
2014-04-15       87      103     98
2014-04-16       81      107    100
2014-04-17       96       92    100

,

daily_counts_xtab.plot()

:

Download the list with the date value in pandas data and the site operation over time

More articles: