R: extract maximum value in vector under certain conditions

Question

R: extract maximum value in vector under certain conditions

I am trying to look into a large dataset that identifies a person’s career history in a firm. I want to see the maximum number of years during which a person worked as a Manager , provided that this person was in the Sales category before becoming a Boss (no matter how many years passed before). The data is as follows: job2 is a dummy variable indicating whether the person was Manager , cumu_job2 designation of cumulative years when the person was in the Manager position (only sequential cumulation is taken into account).

  id name year job job2 cumu_job2 1 Jane 1980 Worker 0 0 1 Jane 1981 Manager 1 1 1 Jane 1982 Sales 0 0 1 Jane 1983 Sales 0 0 1 Jane 1984 Manager 1 1 1 Jane 1985 Manager 1 2 1 Jane 1986 Boss 0 0 2 Bob 1985 Worker 0 0 2 Bob 1986 Sales 0 0 2 Bob 1987 Manager 1 1 2 Bob 1988 Manager 1 2 2 Bob 1989 Boss 0 0

Extracting the maximum years of a person’s work, provided that the person had a history of working in Sales , I would like the data to have another column that indicates this information:

 id name year job job2 cumu_job2 cumu_max 1 Jane 1983 Sales 0 0 1 Jane 1986 Boss 0 2 2 Bob 1986 Sales 0 0 2 Bob 1989 Boss 0 2

So, I believe that this requires two steps - first I only need to extract the case when a person moves from Sales to Boss , and then saves the maximum value for each person in the new cumu_max vector based on cumu_job2 .

This is a complex process, so any suggestions will be greatly appreciated ...!

I thought why the answer below using dplyr does not work, and here is what I think - the example showed that all people became bosses only after they became Manager, but I also have data points that look like Kevin :

 id name year job job2 cumu_job2 1 Jane 1980 Worker 0 0 1 Jane 1981 Manager 1 1 1 Jane 1982 Sales 0 0 1 Jane 1983 Sales 0 0 1 Jane 1984 Manager 1 1 1 Jane 1985 Manager 1 2 1 Jane 1986 Boss 0 0 2 Bob 1985 Worker 0 0 2 Bob 1986 Sales 0 0 2 Bob 1987 Manager 1 1 2 Bob 1988 Manager 1 2 2 Bob 1989 Boss 0 0 3 Kevin 1991 Manager 1 1 3 Kevin 1992 Manager 1 2 3 Kevin 1993 Sales 0 0 4 Kevin 1994 Boss 0 0

So in the end I would like

  id name year job cumu_job2 cumu_max 1 Jane 1983 Sales 0 0 1 Jane 1986 Boss 0 2 2 Bob 1986 Sales 0 0 2 Bob 1989 Boss 0 2 3 Kevin 1993 Sales 0 2 3 Kevin 1994 Boss 0 2

The dplyr solution only spits out those who switched from Sales-Manager-Boss, not taking into account the manager's ability - Sales-Boss (which is more visible in my data set).

+1

performance r row

song0089 Feb 05 '14 at 23:46

source share

1 answer

Vincent · Accepted Answer · 2014-02-06T01:05:58+0000

This may not cover all the cases in your evidence, but it does (mostly) what you are looking for. Please note that I added Gill, which should be excluded according to your conditions.

 require(dplyr) dat <- read.table(header = TRUE, text = "id name year job job2 cumu_job2 1 Jane 1980 Worker 0 0 1 Jane 1981 Manager 1 1 1 Jane 1982 Sales 0 0 1 Jane 1983 Sales 0 0 1 Jane 1984 Manager 1 1 1 Jane 1985 Manager 1 2 1 Jane 1986 Boss 0 0 2 Bob 1985 Worker 0 0 2 Bob 1986 Sales 0 0 2 Bob 1987 Manager 1 1 2 Bob 1988 Manager 1 2 2 Bob 1989 Boss 0 0 3 Jill 1989 Worker 0 0 3 Jill 1990 Boss 0 0") dat %.% group_by(id) %.% mutate( all_jobs = sum(unique(job) %in% c("Sales","Manager","Boss")), cumu_max = max(cumu_job2) ) %.% filter(all_jobs == 3, job %in% c("Sales","Boss")) Source: local data frame [5 x 8] Groups: id id name year job job2 cumu_job2 all_jobs cumu_max 1 1 Jane 1982 Sales 0 0 3 2 2 1 Jane 1983 Sales 0 0 3 2 3 1 Jane 1986 Boss 0 0 3 2 4 2 Bob 1986 Sales 0 0 3 2 5 2 Bob 1989 Boss 0 0 3 2

R: extract maximum value in vector under certain conditions

More articles: