Combining column contents using an application or other vectorized approach

I have a data.frame that is almost completely blank, but each line has one value. How can I use a vector or other r-language approach to combine the contents of each line into one vector?

sample data:

raw_data <- structure( list( col1 = c("", "", "", "", ""), col2 = c("", "", "", "", ""), col3 = c("", "", "", "", ""), col4 = c("", "", "", "Millburn - Union", ""), col5 = c("", "", "Cranston (aka Garden City Center)", "",""), col6 = c("", "", "", "", ""), col7 = c("", "", "", "", ""), col8 = c("", "", "", "", "Colorado Blvd"), col9 = c("", "", "", "", ""), col10 = c("", "", "", "", ""), col11 = c("Palo Alto", "Castro (aka Market St)", "", "", "") ), .Names = c("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9", "col10", "col11"), row.names = c(5L, 4L, 3L, 2L, 1L), class = "data.frame" ) 

This is what I tried, but it fails as it returns a 2-dimensional matrix instead of the desired vector:

 raw_data$test <- apply(raw_data, MAR=1, FUN=paste0) 
+5
source share
3 answers

Your intuition about apply correct. You just need to pass the collapse argument to paste :

  apply( raw_data, 1, paste0, collapse = "" ) 5 4 3 "Palo Alto" "Castro (aka Market St)" "Cranston (aka Garden City Center)" 2 1 "Millburn - Union" "Colorado Blvd" 
+3
source

You can do this very simply with a single index operation:

 raw_data[raw_data!=''] 

Demo:

 R> raw_data[raw_data!='']; [1] "Millburn - Union" "Cranston (aka Garden City Center)" "Colorado Blvd" "Palo Alto" "Castro (aka Market St)" 

If you want the vector order to be from top to bottom (as opposed to from left to right, and then from top to bottom, as happens above), you can transfer the input data .frame:

 R> t(raw_data)[t(raw_data)!='']; [1] "Palo Alto" "Castro (aka Market St)" "Cranston (aka Garden City Center)" "Millburn - Union" "Colorado Blvd" 
+5
source

In this example, there is only one element for a string that is not. '' Here is another way to use paste with do.call

 do.call(paste, c(raw_data, sep='')) #[1] "Palo Alto" "Castro (aka Market St)" #[3] "Cranston (aka Garden City Center)" "Millburn - Union" #[5] "Colorado Blvd" 

Suppose that in 'raw_data' that are not '' there are several elements in a row, then in this case it is better to use sep=';' or sep=';' or , `.

 raw_data[1,1] <- 'Millburn' raw_data[1,3] <- 'Something' gsub('^;+|;+$|(;);+', '\\1', do.call(paste, c(raw_data, sep=';'))) #[1] "Millburn;Something;Palo Alto" "Castro (aka Market St)" #[3] "Cranston (aka Garden City Center)" "Millburn - Union" #[5] "Colorado Blvd" 

apply gets the same result as above

 unname(apply(raw_data, 1, FUN=function(x) paste(x[x!=''],collapse=';'))) #[1] "Millburn;Something;Palo Alto" "Castro (aka Market St)" #[3] "Cranston (aka Garden City Center)" "Millburn - Union" #[5] "Colorado Blvd" 
+1
source

Source: https://habr.com/ru/post/1216331/


All Articles