Selecting specific rows in a data frame

Question

Selecting specific rows in a data frame

I have a 138x50 data frame of the following type:

B = matrix( c("ehre","e", "eh", "ehr", "ehrr", "f", "fi", "fie", "fiee", "fiel", "seil", "s", "se", "sei", "seii", "leiht", "l", "le", "lei", "leih", 3, 0, 0, 0, 1), nrow=5, ncol=5, byrow = FALSE) colnames(B)<-c("ana1_1", "ana2_1", "ana3_1", "ana4_1", "points")

I want to create a new df "A" containing only the correct answers and replacing the incorrect answers with the empty / NA cell:

 A = matrix( c("ehre",NA, NA, NA, NA, NA, NA, NA, NA, "fiel", "seil", NA, NA, NA, NA, "leiht", NA, NA, NA, NA, 3, 0, 0, 0, 1), nrow=5, ncol=5, byrow = FALSE) colnames(A)<-c("ana1_1", "ana2_1", "ana3_1", "ana4_1", "points")

How do I create A, and not by deleting the wrong one, but choosing the correct answers? (since this will require fewer responses per type).
How to count the number of rows in a row (to create column 5)?

Thank you very much for your response!

+4

string r

Hausladen carina Aug 9 '15 at 2:43

source share

2 answers

akrun · Answer 1 · 2015-08-09T16:32:45+0000

You can use grep for this. Create vector elements that you would like to be non-NA based on ana columns.

  v1 <- c('ehre', 'seil', 'leiht', 'fiel')

We paste along with collapse='|' for pattern argument in grep

  pat <- paste0('^(', paste(v1, collapse='|'), ')$')

Create an index for "ana" columns

  indx <- grepl('^ana', colnames(B))

I create a new object "A1", which will be a modified matrix "B", which has only the columns "ana".

  A1 <- B[,indx]

The grepl output will be a logical vector. We deny ( ! ) It and assign the corresponding column elements in 'A1' to NA

 A1[!grepl(pat, A1)] <- NA

To create a “points” column (although it has already been created in the example), we get a logical index of non-NA values in the “ana” columns ( !is.na(A1) ), we get rowSums and cbind with the original data set.

 cbind(A1, Points=rowSums(!is.na(A1))) # ana1_1 ana2_1 ana3_1 ana4_1 Points #[1,] "ehre" NA "seil" "leiht" "3" #[2,] NA NA NA NA "0" #[3,] NA NA NA NA "0" #[4,] NA NA NA NA "0" #[5,] NA "fiel" NA NA "1"

It’s better to save the results in 'data.frame', since “Points” is a “numeric” vector that is converted to a “character”, storing it in matrix (since matrix can have only one class ).

rbatt · Answer 2 · 2015-08-09T16:31:08+0000

I start with your B , which also contains "points" , but this column is not required. I recreated it later. Therefore, I first change all inconsistencies to NA , then count the number of non-NAs in each row (ignoring the "points" column) for scoring. The correct object is just the vector of the correct answer you are looking for.

 B = matrix( c("ehre","e", "eh", "ehr", "ehrr", "f", "fi", "fie", "fiee", "fiel", "seil", "s", "se", "sei", "seii", "leiht", "l", "le", "lei", "leih", 3, 0, 0, 0, 1), nrow=5, ncol=5, byrow = FALSE) colnames(B)<-c("ana1_1", "ana2_1", "ana3_1", "ana4_1", "points") correct <- c("ehre","fiel","seil","leiht") A <- B A[!A%in%correct] <- NA A[,"points"] <- apply(A[,colnames(A)!="points"], 1, function(x)sum(!is.na(x))) #tally up non-NA to indicate points

This procedure leads to the following conclusion for A :

  ana1_1 ana2_1 ana3_1 ana4_1 points [1,] "ehre" NA "seil" "leiht" "3" [2,] NA NA NA NA "0" [3,] NA NA NA NA "0" [4,] NA NA NA NA "0" [5,] NA "fiel" NA NA "1"

Selecting specific rows in a data frame

More articles: