Why does str () show incorrect information for factor levels after creating a submatrix in R?

I have the following data frame in R with 274569 rows and 15 columns:

> str(x2) 'data.frame': 274569 obs. of 15 variables: $ ykod : int 99 99 99 99 99 99 99 99 99 99 ... $ yad : Factor w/ 43 levels "BAKUGAN","BARBIE",..: 2 2 2 2 2 2 2 2 2 2 ... $ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ... $ donem: int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ... $ sayi : int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ... $ mkod : int 359 361 362 363 366 847 849 850 1505 1506 ... $ mad : Factor w/ 11045 levels " Hilal Gida ",..: 5163 3833 10840 8284 10839 2633 10758 10293 6986 6984 ... $ mtip : Factor w/ 30 levels "Abone Bürosu ",..: 20 20 20 20 20 2 2 2 11 11 ... $ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 1 1 ... $ bkod : int 110006 110006 110006 110006 110006 110006 110006 110006 110006 110006 ... $ bad : Factor w/ 213 levels "4. Levent","500 Evler",..: 25 25 25 25 25 25 25 25 25 25 ... $ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ... $ sevk : int 5 2 2 2 10 0 4 3 13 32 ... $ iade : int 0 2 1 2 4 0 3 2 0 8 ... $ satis: int 5 0 1 0 6 0 1 1 13 24 ... 

I create a submatrix and show its structure:

 > msub <- x2[x2$ykod == 99,] > str(msub) 'data.frame': 14367 obs. of 15 variables: $ ykod : int 99 99 99 99 99 99 99 99 99 99 ... $ yad : Factor w/ 43 levels "BAKUGAN","BARBIE",..: 2 2 2 2 2 2 2 2 2 2 ... $ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ... $ donem: int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ... $ sayi : int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ... $ mkod : int 359 361 362 363 366 847 849 850 1505 1506 ... $ mad : Factor w/ 11045 levels " Hilal Gida ",..: 5163 3833 10840 8284 10839 2633 10758 10293 6986 6984 ... $ mtip : Factor w/ 30 levels "Abone Bürosu ",..: 20 20 20 20 20 2 2 2 11 11 ... $ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 1 1 ... $ bkod : int 110006 110006 110006 110006 110006 110006 110006 110006 110006 110006 ... $ bad : Factor w/ 213 levels "4. Levent","500 Evler",..: 25 25 25 25 25 25 25 25 25 25 ... $ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ... $ sevk : int 5 2 2 2 10 0 4 3 13 32 ... $ iade : int 0 2 1 2 4 0 3 2 0 8 ... $ satis: int 5 0 1 0 6 0 1 1 13 24 ... 

Now I have a submatrix with 14367 rows and 15 columns, but factor levels still exist. They should have been reduced. For example, yad should be only one factor for yad .

How can I easily get str () to show the correct information for factor levels so that when I enter str(msub) it gives me the correct values?

+4
source share
4 answers

This is the expected behavior. Levels of factors that have no representation in your subset do not disappear until you tell them. You can use droplevels() .

+13
source

In fact, str shows you the correct structural information: the factor has the ability to show levels. Imagine that you combine two of your sub-matrices, which contain some levels and another set: it would be difficult to combine this! This is just how factors work in R.

If you want to know what factors are present in your data, one of the parameters uses table to count the occurrences.

If you want your coefficient to decrease, so it contains only those levels that are actually present, you can reuse it:

 myfact<-factor(rep(1:2,5), levels=1:3, labels=letters[1:3]) myfact # [1] ababababab #Levels: abc factor(myfact) # [1] ababababab #Levels: ab 

You can simply apply this to all the factor columns of your data.frame to get what you say you want.

+5
source

Factor levels are part of a column and are independent of actual levels:

 > x <- factor(LETTERS[1:10]) > x [1] ABCDEFGHIJ Levels: ABCDEFGHIJ > y <- x[1] > y [1] A Levels: ABCDEFGHIJ > factor(y) [1] A Levels: A > 

I am sure there is another way, but this should work.

+1
source
 x <- factor(LETTERS[1:10]) y <- x[1, drop=TRUE] y 
+1
source

All Articles