R merges with itself

Is it possible to combine data, for example

name,#797,"Stachy, Poland" at_rank,#797,1 to_center,#797,4.70 predicted,#797,4.70 

Are the column names indicated according to the second column and the first column?

  name at_rank to_center predicted #797 "Stachy, Poland" 1 4.70 4.70 

On request, the entire data set: http://sprunge.us/cYSJ

+4
source share
3 answers

The first problem of reading data in should not be a problem if your comma lines are quoted (what they seem to be). Using read.csv with the argument header=FALSE does the trick with the data you provide. (Of course, if there were headers in the data file, remove this argument.)

From there you have several options. Here are two.

  • reshape (base R) is great for this:

     myDF <- read.csv("http://sprunge.us/cYSJ", header=FALSE) myDF2 <- reshape(myDF, direction="wide", idvar="V2", timevar="V1") head(myDF2) # V2 V3.name V3.at_rank V3.to_center V3.predicted # 1 #1 Kitoman 1 2.41 2.41 # 5 #2 Hosaena 2 4.23 9.25 # 9 #3 Vinzelles, Puy-de-Dôme 1 5.20 5.20 # 13 #4 Whitelee Wind Farm 6 3.29 8.07 # 17 #5 Steveville, Alberta 1 9.59 9.59 # 21 #6 Rocher, Ardèche 1 0.13 0.13 
  • The reshape2 package reshape2 also useful in these cases. It has a simpler syntax, and the output is also a bit cleaner (at least in terms of variable names).

     library(reshape2) myDFw_2 <- dcast(myDF, V2 ~ V1) # Using V3 as value column: use value.var to override. head(myDFw_2) # V2 at_rank name predicted to_center # 1 #1 1 Kitoman 2.41 2.41 # 2 #10 4 Icaraí de Minas 6.07 8.19 # 3 #100 2 Scranton High School (Pennsylvania) 5.78 7.63 # 4 #1000 1 Bat & Ball Inn, Clanfield 2.17 2.17 # 5 #10000 3 Tăuteu 1.87 5.87 # 6 #10001 1 Oak Grove, Northumberland County, Virginia 5.84 5.84 
+2
source

Check out the reshape package from Hadley. If I understand correctly, you simply rotate your data from long to wide.

+1
source

I think that in this case, all you really need to do is transpose, drop in data.frame, set the columns to the first row, and then delete the first row. Perhaps you can skip the last step with some combination of arguments in data.frame, but I don't know what they are doing now.

0
source

All Articles