Middle columns in data frame in R

I would like to average the columns in a data frame in R, which contains integer values, and sometimes NA.

The information frame is called CD6 (Climate Division 6), which is initialized with NA values ​​to store the average values ​​for all data related to Climate 6. The rows represent dates, and the columns represent hours from 0 to 23. The data screen looks like this: this:

> CD6 Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 ... H23 1948-07-01 NA NA NA NA NA NA NA NA NA NA NA ... NA 1948-07-02 NA NA NA NA NA NA NA NA NA NA NA ... NA 1948-07-03 NA NA NA NA NA NA NA NA NA NA NA ... NA 

An information frame called CA has true values ​​for all climate divisions from 1 to 7. The data frame looks something like this:

  > CA Climate_Division Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 ... H23 6 1948-07-01 NA NA NA NA NA NA NA NA NA NA NA ... NA 5 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 ... 2 6 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 ... 2 6 1948-07-01 1 0 0 5 7 0 1 1 1 0 0 ... 0 6 1948-07-02 0 2 1 2 1 1 NA 0 1 0 1 ... 2 6 1948-07-03 NA NA NA NA NA NA NA NA NA NA NA ... NA 

I have a coded loop that will go through the data center row-by-row CA and display the correct data frame for climate separation (in this example, CD6 for climate separation 6). The problem is that I do not know how many lines there are for each climate division, so that on average to accept it.

If you look only at CD6, I would like to get the average value for each date at a certain hour, which NA ignores if true values ​​are present, and the final answer is an integer (value ceiling). If all the clocks in different climate departments are NA, I would like to keep it so that it contrasts 0. The end result should look like this for CD6

  > CD6 Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 ... H23 1948-07-01 1 1 1 4 4 0 1 1 1 0 0 ... 1 1948-07-02 0 2 1 2 1 1 NA 0 1 0 1 ... 2 1948-07-03 NA NA NA NA NA NA NA NA NA NA NA ... NA 

I do not know exactly how to do this, coding it and possessing its skills. Therefore, any suggestions will be useful and grateful for your time.

+4
source share
2 answers

What you are looking for is an aggregate tool, grouping in two columns CA ie Climate_Division and Date . You can use the built-in aggregate function for this.

 > t <- 'Climate_Division Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 + 6 1948-07-01 NA NA NA NA NA NA NA NA NA NA NA + 5 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 + 6 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 + 6 1948-07-01 1 0 0 5 7 0 1 1 1 0 0 + 6 1948-07-02 0 2 1 2 1 1 NA 0 1 0 1 + 6 1948-07-03 NA NA NA NA NA NA NA NA NA NA NA' > > CA <- read.table(textConnection(t), header=T) > > CA Climate_Division Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 1 6 1948-07-01 NA NA NA NA NA NA NA NA NA NA NA 2 5 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 3 6 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 4 6 1948-07-01 1 0 0 5 7 0 1 1 1 0 0 5 6 1948-07-02 0 2 1 2 1 1 NA 0 1 0 1 6 6 1948-07-03 NA NA NA NA NA NA NA NA NA NA NA > #Now that we have our data, we do aggregation of data and calculate mean over that using following command > CAMeans <- aggregate(CA[,3:13], by =list(CA[,1], CA[,2]), FUN = mean, na.rm = TRUE) > > CAMeans Group.1 Group.2 H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 1 5 1948-07-01 0.0 1.0 1.0 3 0.0 0 0.0 0.0 0.0 0 0 2 6 1948-07-01 0.5 0.5 0.5 4 3.5 0 0.5 0.5 0.5 0 0 3 6 1948-07-02 0.0 2.0 1.0 2 1.0 1 NaN 0.0 1.0 0 1 4 6 1948-07-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN > > #Need to change the names of grouping column back to what they were before > names(CAMeans)[1:2] <- c('Climate_Division', 'Date') > > CAMeans Climate_Division Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 1 5 1948-07-01 0.0 1.0 1.0 3 0.0 0 0.0 0.0 0.0 0 0 2 6 1948-07-01 0.5 0.5 0.5 4 3.5 0 0.5 0.5 0.5 0 0 3 6 1948-07-02 0.0 2.0 1.0 2 1.0 1 NaN 0.0 1.0 0 1 4 6 1948-07-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN > > #Now you can subset CAMeans to get content for CD6 > CD6 <- CAMeans[CAMeans$Climate_Division == 6, 2:ncol(CAMeans)] > > CD6 Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 2 1948-07-01 0.5 0.5 0.5 4 3.5 0 0.5 0.5 0.5 0 0 3 1948-07-02 0.0 2.0 1.0 2 1.0 1 NaN 0.0 1.0 0 1 4 1948-07-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
+2
source

Guessing what you want here, I gave 2 options: rowMeans() and colMeans() .

 CA <- read.table( header=TRUE, text='Climate_Division Date H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H23 6 1948-07-01 NA NA NA NA NA NA NA NA NA NA NA NA 5 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 2 6 1948-07-01 0 1 1 3 0 0 0 0 0 0 0 2 6 1948-07-01 1 0 0 5 7 0 1 1 1 0 0 0 6 1948-07-02 0 2 1 2 1 1 NA 0 1 0 1 2 6 1948-07-03 NA NA NA NA NA NA NA NA NA NA NA NA') CD6 <- data[CA$Climate_Division==6, ] # Populating your data does not require a loop. (CD6rmeans <- rowMeans(CD6[, -2], na.rm=TRUE)) # 1 3 4 5 6 # 6.000 1.000 1.692 1.417 6.000 t(CD6cmeans <- colMeans(CD6[ ,-2], na.rm=TRUE)) # Climate_Division H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H23 # [1,] 6 0.3333 1 0.6667 3.333 2.667 0.3333 0.5 0.3333 0.6667 0 0.3333 1.333 
+1
source

All Articles