Change the contents of the data frame to R

Question

Change the contents of the data frame to R

I would like to convert / change the contents of the frame. I basically have a dataframe as shown below:

bins pval 1 2L:1:150 0.9224217 2 2L:151:300 0.9478824 3 2L:301:450 0.9671139 4 2L:451:600 0.9280847 5 2L:601:750 0.9698584 6 2L:751:900 0.9725379

And I would like to convert / change to a different data beat, like this, where I split the contents of my “bit” column (first row) into 150 rows containing the same values. And so on for the second line.

  chr pos pval 1 2L 1 0.9224217 2 2L 2 0.9224217 3 2L 3 0.9224217 4 2L 4 0.9224217 5 2L 5 0.9224217 ... 150 2L 150 0.9224217 151 2L 151 0.9478824 152 2L 152 0.9478824 153 2L 153 0.9478824 etc...

Any help is greatly appreciated

Ben

+4

r dataframe transform

Benoit B. Nov 30 '10 at 18:32

source share

3 answers

Here is an attempt at a more generalized answer that might be more effective. I could not find an easy way to convert from factor to numeric , keeping the levels in a new numeric column. Regardless, this should work and can support different values for the "chr" column and different number of rows:

 library(plyr) df <- read.table(textConnection(" bins pval 1 2L:1:150 0.9224217 2 2L:151:300 0.9478824 3 2L:301:450 0.9671139 4 2L:451:600 0.9280847 5 2L:601:750 0.9698584 6 2L:751:900 0.9725379 "), header = TRUE) #Split bins df.split <- data.frame(matrix(unlist(strsplit(as.character(df$bins), ":")), ncol = 3, byrow = TRUE )) colnames(df.split) <- c("chr", "low", "high") df.split$low <- as.numeric(as.character(df.split$low)) df.split$high <- as.numeric(as.character(df.split$high)) #Attach the pval from original df df.split$pval <- df[, 2] df.new <- adply(df.split, 1, summarise, pos = (low - 1) + seq(low:high)) df.new <- df.new[, c(1, 5, 4)]

0

Chase Nov 30 '10 at 19:53

source share

Import Firs with strings AsFactors = FALSE, so as not to get coefficients (or use Chase to convert to character):

 df <- read.table(textConnection(" bins pval 1 2L:1:150 0.9224217 2 2L:151:300 0.9478824 3 2L:301:450 0.9671139 4 2L:451:600 0.9280847 5 2L:601:750 0.9698584 6 2L:751:900 0.9725379 "), header = TRUE, stringsAsFactors = FALSE)

Now, the rest:

 split <- strsplit(df$bins, ":") df$chr <- sapply(split, "[[", 1) reps <- sapply(split, function(el) diff(as.numeric(el[2:3]))+1) df[rep(1:nrow(df), reps), c("chr", "pval")] chr pval 1 2L 0.9224217 1.1 2L 0.9224217 1.2 2L 0.9224217 1.3 2L 0.9224217 1.4 2L 0.9224217 1.5 2L 0.9224217 1.6 2L 0.9224217 1.7 2L 0.9224217 1.8 2L 0.9224217 1.9 2L 0.9224217 1.10 2L 0.9224217 ...

0

VitoshKa Nov 30 '10 at 21:23

source share

42- · Accepted Answer · 2010-11-30T19:09:04+0000

The quick answer, which may be, I am afraid, too specific, may require generalization. Suppose the first dataframe is called "df1":

data.frame (chr = "2L", pos = 1: (150 * NROW (df1)), pval = rep (df1 $ pval, each = 150))

The argument recursion should make "chr" long enough without the rep function.

Change in response to the comment. If the repetition length is always 150, the correction is easy:

 data.frame(chr = rep(substr(df1$bins, 1,2), each=150), pos = 1:(150*NROW(df1)), pval = rep(df1$pval, each=150) )

Change the contents of the data frame to R

More articles: