R: convert a discrete column to a matrix of logical values

Question

R: convert a discrete column to a matrix of logical values

I want to convert a discrete (identifier) variable into a series of logical columns so that I can use this variable as a function of the logistic regression function (and others) where I cannot mix continuous and discrete values.

I have a factor column in a data frame, and I want to convert the column to a column matrix (1 .. "number of levels") of logical values, for example:

my_labels=c("a","b","c","d","e","f")
my_tally=c(1,1,3,2,3,4,5,1)
my_tally=factor(my_tally, levels=c(1:6), labels=my_labels)
summary(my_tally)

expected_output=c(1,0,0,0,0,0,     #1
                  1,0,0,0,0,0,     #1
                  0,0,1,0,0,0,     #3
                  0,1,0,0,0,0,     #2
                  0,0,1,0,0,0,     #3
                  0,0,0,1,0,0,     #4
                  0,0,0,0,1,0,     #5
                  1,0,0,0,0,0      #1
                  )

expected_output=matrix(expected_output, 
                       nrow=length(my_tally), 
                       ncol=length(levels(my_tally)),
                       byrow=TRUE
                       )

expected_output
colSums(expected_output)

Any suggestions for a "quick" function to create expect_output? This is a big data problem (700 discrete capabilities, 1M observations).

+4

matrix r machine-learning

Pieter wessels Jun 30 '15 at 13:15

source share

3 answers

:

expected_output<-table(1:length(my_tally),my_tally)
expected_output
colSums(expected_output)

a b c d e f 
3 1 2 1 1 0

+2

Robert 30 . '15 13:30

Here is a relatively simple solution using the function apply:

updateOutput <- function(entry, classInput = my_tally){
  column <- as.numeric(classInput[entry])
  row <- rep(0, length(levels(classInput)))
  row[column] <- 1
  row

}

expected_output <- t(apply(matrix(1:length(my_tally)), 1, updateOutput))

expected_output

0

Morris greenberg Jun 30 '15 at 13:45

source share

Jota · Accepted Answer · 2015-06-30T13:33:40+0000

, R, Matrix, .

, 0

mat <- matrix(0, nrow=length(my_tally), ncol=length(levels(my_tally)))

1, :

mat[cbind(1:length(my_tally), as.numeric(my_tally))] <- 1
#     [,1] [,2] [,3] [,4] [,5] [,6]
#[1,]    1    0    0    0    0    0
#[2,]    1    0    0    0    0    0
#[3,]    0    0    1    0    0    0
#[4,]    0    1    0    0    0    0
#[5,]    0    0    1    0    0    0
#[6,]    0    0    0    1    0    0
#[7,]    0    0    0    0    1    0
#[8,]    1    0    0    0    0    0

colSums(mat)
#[1] 3 1 2 1 1 0

№ 2:

library(Matrix)
colSums(sparseMatrix(i=1:length(my_tally), j=as.numeric(my_tally),
    dims=c(length(my_tally), length(levels(my_tally)))))
#[1] 3 1 2 1 1 0

(260 , 100 000 ), :

# Sample data
my_labels <- c(LETTERS, letters, paste0(LETTERS, letters), paste0(letters, LETTERS),
            paste0(letters, letters, letters), paste0(LETTERS, LETTERS, LETTERS),
            paste0(LETTERS, letters, LETTERS), paste0(letters, LETTERS, letters),
            paste0(LETTERS, letters, letters), paste0(letters, LETTERS, LETTERS))
my_tally <- sample(1:260, 100000, replace=TRUE)
my_tally <- factor(my_tally, levels=c(1:260), labels=my_labels)

# Benchmarks
library(microbenchmark)
microbenchmark(
  Robert <- colSums(table(1:length(my_tally),my_tally)),
  Frank1 <- {mat <- matrix(0, nrow=length(my_tally), ncol=length(levels(my_tally)))
      mat[cbind(1:length(my_tally), as.numeric(my_tally))] <- 1
      colSums(mat)},
  Frank2 <- colSums(sparseMatrix(i=1:length(my_tally), j=as.numeric(my_tally),
      dims=c(length(my_tally), length(levels(my_tally))))),
  Khashaa <- colSums(diag(length(my_labels))[my_tally, ])
  )

                lq       mean     median         uq      max neval  cld
Robert  444.625026 486.130804 461.653480 548.755603 632.1418   100    d
Frank1  328.947431 358.538855 337.136012 360.727606 458.2305   100   c 
Frank2    4.241506   8.997434   4.354615   4.519896 135.3001   100 a   
Khashaa 224.675094 256.337639 237.905714 260.163725 375.5642   100  b

R: convert a discrete column to a matrix of logical values

№ 2:

More articles: