I am trying to look into a large dataset that identifies a person’s career history in a firm. I want to see the maximum number of years during which a person worked as a Manager , provided that this person was in the Sales category before becoming a Boss (no matter how many years passed before). The data is as follows: job2 is a dummy variable indicating whether the person was Manager , cumu_job2 designation of cumulative years when the person was in the Manager position (only sequential cumulation is taken into account).
id name year job job2 cumu_job2 1 Jane 1980 Worker 0 0 1 Jane 1981 Manager 1 1 1 Jane 1982 Sales 0 0 1 Jane 1983 Sales 0 0 1 Jane 1984 Manager 1 1 1 Jane 1985 Manager 1 2 1 Jane 1986 Boss 0 0 2 Bob 1985 Worker 0 0 2 Bob 1986 Sales 0 0 2 Bob 1987 Manager 1 1 2 Bob 1988 Manager 1 2 2 Bob 1989 Boss 0 0
Extracting the maximum years of a person’s work, provided that the person had a history of working in Sales , I would like the data to have another column that indicates this information:
id name year job job2 cumu_job2 cumu_max 1 Jane 1983 Sales 0 0 1 Jane 1986 Boss 0 2 2 Bob 1986 Sales 0 0 2 Bob 1989 Boss 0 2
So, I believe that this requires two steps - first I only need to extract the case when a person moves from Sales to Boss , and then saves the maximum value for each person in the new cumu_max vector based on cumu_job2 .
This is a complex process, so any suggestions will be greatly appreciated ...!
I thought why the answer below using dplyr does not work, and here is what I think - the example showed that all people became bosses only after they became Manager, but I also have data points that look like Kevin :
id name year job job2 cumu_job2 1 Jane 1980 Worker 0 0 1 Jane 1981 Manager 1 1 1 Jane 1982 Sales 0 0 1 Jane 1983 Sales 0 0 1 Jane 1984 Manager 1 1 1 Jane 1985 Manager 1 2 1 Jane 1986 Boss 0 0 2 Bob 1985 Worker 0 0 2 Bob 1986 Sales 0 0 2 Bob 1987 Manager 1 1 2 Bob 1988 Manager 1 2 2 Bob 1989 Boss 0 0 3 Kevin 1991 Manager 1 1 3 Kevin 1992 Manager 1 2 3 Kevin 1993 Sales 0 0 4 Kevin 1994 Boss 0 0
So in the end I would like
id name year job cumu_job2 cumu_max 1 Jane 1983 Sales 0 0 1 Jane 1986 Boss 0 2 2 Bob 1986 Sales 0 0 2 Bob 1989 Boss 0 2 3 Kevin 1993 Sales 0 2 3 Kevin 1994 Boss 0 2
The dplyr solution only spits out those who switched from Sales-Manager-Boss, not taking into account the manager's ability - Sales-Boss (which is more visible in my data set).
performance r row
song0089
source share