Merging the List of data frames into a single data frame or avoiding them at all

I have a dataset like:

Company, Product, Users
MSFT, Office, 1000
MSFT, VS, 4000
GOOG, gmail, 3203
GOOG, appengine, 45454
MSFT, Windows 1500
APPL, iOS, 6000
APPL, iCloud, 3442

I am writing a function that returns a data frame with the nth product for each company occupied by Users, so the output of rankcompany (1) should be:

     Company prodcut users
APPL APPL iOS 6000
GOOG GOOG appengine 45454
MSFT MSFT VS 4000

The function looks like this:

rankcompany <- function(num=1){

    #Read data file
    company_data <- read.csv("company.csv",stringsAsFactors = FALSE)

    #split by company
    split_data <- split(company_data, company_data$Company)

    #sort and select the nth row
    selected <- lapply(split_data, function(df) {
                                                df <- df[order(-df$Users, df$Product),]
                                                df[num,]
                                                 })

    #compose output data frame
    #this part needs to be smarter??
    len <- length(selected)
    selected_df <- data.frame(Company=character(len),Prodcut=character(len), Users=integer(len),stringsAsFactors = FALSE)
    row.names(selected_df) <- names(selected)


    for (n in names(selected)){
        print(str(selected[[n]]))
        selected_df[n,] <- selected[[n]][1,]

    }

    selected_df
}

I split the input data frame into a list, then sorted and selected, then try to combine the result into the output data frame "selected_df"

R, , . ? ?

+4
4

dplyr:

rankcompany <- function(d, num=1) {
   d %>% group_by(Company) %>% arrange(desc(Users)) %>% slice(num)
}

:

rankcompany(d,2)

:

d %>% rankcompany(1)
+5

@DMT :

    selected_df <- rbindlist(selected)
    selected_df <- as.data.frame(selected_df)
    row.names(selected_df) <- names(selected)
    selected_df

.

+4

split lapply, .

rankcompany <- function(N){
    byCompany <- split(df, sorted$Company)
    ranks <- lapply(byCompany,
             function(x)
             {
               r <- which(rank(-x$Users)==N)
               x[r,]
             })
    do.call("rbind", ranks)
}

rankcompany(1)

> rankcompany(1)
     Company   Product Users
APPL    MSFT        VS  4000
GOOG    GOOG appengine 45454
MSFT    APPL       iOS  6000
+2

rbindlist, data.frame, :

library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
#   Company   Product Users
#1:    APPL       iOS  6000
#2:    GOOG appengine 45454
#3:    MSFT        VS  4000

setDTconverts data.frameto data.tableby reference (without using additional copy / memory). Then we sort the data table in descending order on the column Users, and then group by company, and for each group we get the nith row from S ubset D ata ( .SD) for this group.

In your case, maybe

DT <- rbindlist(selected)
DT[order(-Users), .SD[n], keyby=Company]

But the previous solution is much more efficient and easy to use with one layer to solve the problem.

data

company_data <-  structure(list(Company = c("MSFT", "MSFT", "GOOG", "GOOG", "MSFT", 
"APPL", "APPL"), Product = c("Office", "VS", "gmail", "appengine", 
"Windows", "iOS", "iCloud"), Users = c(1000L, 4000L, 3203L, 45454L, 
1500L, 6000L, 3442L)), .Names = c("Company", "Product", "Users"
), class = "data.frame", row.names = c(NA, -7L))
+2
source

All Articles