I have a data frame with course names for each year. I need to find the duration in months starting in 2016.
from io import StringIO
import pandas as pd
u_cols = ['page_id','web_id']
audit_trail = StringIO('''
year_id | web_id
2012|efg
2013|abc
2014| xyz
2015| pqr
2016| mnp
''')
df11 = pd.read_csv(audit_trail, sep="|", names = u_cols )
How to add months to a new column starting with the highest (e.g. bottom, like bfill?)
The final data frame will look like this:
u_cols = ['page_id','web_id' , 'months']
audit_trail = StringIO('''
year_id | web_id | months
2012|efg | 60
2013|abc | 48
2014| xyz | 36
2015| pqr | 24
2016| mnp | 12
''')
df12 = pd.read_csv(audit_trail, sep="|", names = u_cols )
Some answers do not take into account that there may be several courses. Updating sample data ...
from io import StringIO
import pandas as pd
u_cols = ['course_name','page_id','web_id']
audit_trail = StringIO('''
course_name| year_id | web_id
a|2012|efg
a|2013|abc
a|2014| xyz
a|2015| pqr
a|2016| mnp
b|2014| xyz
b|2015| pqr
b|2016| mnp
''')
df11 = pd.read_csv(audit_trail, sep="|", names = u_cols )
source
share