Data
I often work with data like microarray. It has a number matrix and annotations for columns and rows. You can think of columns as faces, of rows as genes and matrices as some measure for each gene-individual pair. Here is a smaller simulated version: (real data can have millions of rows).
p <- 500000
N <- 50
mat <- matrix(rnorm(p*N), ncol=N)
colData <- replicate(10, sample(letters[sample(26, 4)], N, replace=TRUE))
colnames(colData) <- toupper(letters[1:10])
rowData <- data.frame("chromosome"=rep(c("chr1","chr2"), rep(p/2,2)),
"coordinates"=rep(1:(p/2), 2),
"someScore"=round(runif(p, max=10))
)
Thus, each row in rowInfo has an annotation for the gene and each column in colInfo - an annotation for an individual (I think: gender, age, etc.).
My user approach is to combine all the information into one list:
theData <- list(mat=mat, rowInfo=rowData, colInfo=colData)
rm(mat, rowData, colData)
And use custom functions to manage this data.
Use cases
:
- ( / ) (.. ).
- ( ) .
- / ( rowMeans /colMeans )
, : :
- , "colData".
- cols , "rowData".
. , . .
, :
Bioconductor. - , bioconductor . .
tidyr/plyr. - "" . - - . .
dplyr. - , , , . .
data.table. - . , , .
. - (apply, tapply ..) - , . , .
split-apply-comb .
, , , .
, .