Can we get factor matrices in R?

Question

Can we get factor matrices in R?

It seems impossible to obtain factor matrices in R. Is this true? If so, why? If not, how do I do this?

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5]) m <- matrix(f,4,5) is.factor(m) # fail. m <- factor(m,letters[1:5]) is.factor(m) # oh, yes? is.matrix(m) # nope. fail. dim(f) <- c(4,5) # aha? is.factor(f) # yes.. is.matrix(f) # yes! # but then I get a strange behavior cbind(f,f) # is not a factor anymore head(f,2) # doesn't give the first 2 rows but the first 2 elements of f # should I worry about it?

+12

integer vector matrix r character

iago-lito Feb 25 '15 at 15:35

source share

2 answers

Unfortunately, coefficient support is not completely universal in R, so many R functions by default treat factors as their internal storage type, which is integer :

 > typeof(factor(letters[1:3])) [1] "integer

This is what happens with matrix , cbind . They don’t know how to handle factors, but they know what to do with integers, so they treat your factor as a whole. head is actually the opposite. He knows how to handle the factor, but he never bothers to check that your factor is also a matrix, so he simply sees it as a normal dimensionless vector of factors.

It is best to act as if you have factors with your matrix to force it into character. Once you are done, you can restore it back to factor form. You can also do this using an integer form, but then you risk strange things (for example, you can multiply a matrix by an integer matrix, but that doesn't make sense for factors).

Note that if you add a class “matrix” to your coefficient, some (but not all) things will start working:

 f <- factor(letters[1:9]) dim(f) <- c(3, 3) class(f) <- c("factor", "matrix") head(f, 2)

It produces:

  [,1] [,2] [,3] [1,] adg [2,] beh Levels: abcdefghi

This does not fix rbind etc.

+4

BrodieG Feb 25 '15 at 16:52

source share

Gavin Simpson · Accepted Answer · 2015-02-25 16:48

In this case, he can walk like a duck and even a charlatan, like a duck, but f from:

 f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5]) dim(f) <- c(4,5)

really is not a matrix, although is.matrix() claims to be strictly one. To be a matrix before is.matrix() , f must be only a vector and have the attribute dim . Adding the attribute to f will pass the test. However, as you saw, as soon as you start using f as a matrix, it quickly loses the functions that make it a factor (you end up working with levels or dimensions get lost).

Indeed, only matrices and arrays exist for types of atomic vectors:

logical
integer,
real,
complex
string (or character) and
raw

plus, as @hadley reminds me, you can also have matrices and arrays of lists (by setting the dim attribute on the list object. See, for example, Matrices and arrays in the section of Hadley's book, Advanced R.)

Anything outside of these types will be bound to some lower type through as.vector() . This does not happen in matrix(f, nrow = 3) because f is atomic according to is.atomic() (which returns TRUE for f because it is internally stored as an integer, and typeof(f) returns "integer" ), but since it has a class . This sets the OBJECT bit in the internal representation of f , and everything the class has must be forced to one of the atomic types via as.vector() :

 matrix <- function(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) { if (is.object(data) || !is.atomic(data)) data <- as.vector(data) ....

Adding dimensions via dim<-() is a quick way to create an array without duplicating an object, but it bypasses some of the checks and balances that R will do if you force f to the matrix using other methods

 matrix(f, nrow = 3) # or as.matrix(f)

This turns out when you try to use basic functions that work with matrices or use a submit method. Note that after f , f still has the class "factor" :

 > class(f) [1] "factor"

which explains the behavior of head() ; you are not getting head.matrix behavior because f not a matrix, at least in relation to the S3 mechanism:

 > debug(head.matrix) > head(f) # we don't enter the debugger [1] dcadbd Levels: abcde > undebug(head.matrix)

and the head.default method calls [ , for which the factor method exists, and therefore the observed behavior:

 > debugonce(`[.factor`) > head(f) debugging in: `[.factor`(x, seq_len(n)) debug: { y <- NextMethod("[") attr(y, "contrasts") <- attr(x, "contrasts") attr(y, "levels") <- attr(x, "levels") class(y) <- oldClass(x) lev <- levels(x) if (drop) factor(y, exclude = if (anyNA(levels(x))) NULL else NA) else y } ....

The behavior of cbind() can be explained from the documented behavior (from ?cbind highlighted by me):

The functions cbind and rbind are S3 generic , ...
....
In the default method, all vectors / matrices must be atomic (see vector ) or lists. Expressions are not allowed. language objects (such as formulas and calls) and pairs will be forced to lists: other objects (such as names and external pointers) will be included as elements in the list result. Any input classes can be discarded (in particular, factors are replaced with their internal codes).

Again, the fact that f has the class "factor" wins you because the default cbind will be called and it will strip level information and return internal integer codes, as you noticed.

In many ways, you should ignore, or at least not completely trust, what the is.foo functions tell you, because they just use simple tests to tell if something is or not an foo object. is.matrix() and is.atomic() clearly wrong when it comes to f (with dimensions) from a certain point of view. They are also correct from the point of view of their implementation, or at least their behavior can be understood from the implementation; I think that is.atomic(f) wrong, but if "if it has an atomic type" R Core means "type", then this is the thing returned by typeof(f) , then is.atomic() is correct. A more rigorous test is.vector() , which f fails:

 > is.vector(f) [1] FALSE

because it has attributes outside the names attribute:

 > attributes(f) $levels [1] "a" "b" "c" "d" "e" $class [1] "factor" $dim [1] 4 5

As for how you should get a matrix of factors, well, you can’t, at least if you want it to store information about factors (labels for levels). One solution would be to use a character matrix that retained the labels:

 > fl <- levels(f) > fm <- matrix(f, ncol = 5) > fm [,1] [,2] [,3] [,4] [,5] [1,] "c" "a" "a" "c" "b" [2,] "d" "b" "d" "b" "a" [3,] "e" "e" "e" "c" "e" [4,] "a" "b" "b" "a" "e"

and we keep f levels for future use if we lose some matrix elements along the way.

Or work with an internal integer representation:

 > (fm2 <- matrix(unclass(f), ncol = 5)) [,1] [,2] [,3] [,4] [,5] [1,] 3 1 1 3 2 [2,] 4 2 4 2 1 [3,] 5 5 5 3 5 [4,] 1 2 2 1 5

and you can always return to levels / methods again through:

 > fm2[] <- fl[fm2] > fm2 [,1] [,2] [,3] [,4] [,5] [1,] "c" "a" "a" "c" "b" [2,] "d" "b" "d" "b" "a" [3,] "e" "e" "e" "c" "e" [4,] "a" "b" "b" "a" "e"

Using a data frame does not seem ideal for this, since each component of the data frame will be considered as a separate factor, while you seem to want to consider the array as one factor with one set of levels.

If you really want to do what you want, which has a factor matrix, you will most likely need to create your own S3 class for this, as well as all the methods to go with it. For example, you can store the factor matrix as a matrix of characters, but with the class "factorMatrix" , where you saved the levels along with the matrix of factors as an additional attribute. Then you need to write [.factorMatrix , which will capture levels, and then use the default method [ in the matrix, and then add the level attribute again. You could write cbind and head methods. However, the list of the required method will grow rapidly, but a simple implementation may be required, and if you make your objects a class c("factorMatrix", "matrix") (that is, inherit from the "matrix" class), you will get all the properties / methods "matrix" class (which will output levels and other attributes) so that you can at least work with objects and see where you need to add new methods to populate the class’s behavior.

Can we get factor matrices in R?

More articles: