Convert string to numeric

I imported a test file and tried to make a histogram

pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t") hist <- as.numeric(pichman$WS) 

However, I get different values ​​from the values ​​in my dataset. Initially, I thought it was because I had text, so I deleted the text:

 table(pichman$WS) ws <- pichman$WS[pichman$WS!="Down" & pichman$WS!="NoData"] 

However, I still get very high numbers, who has an idea?

+75
string r
Feb 08 2018-11-11T00:
source share
2 answers

I suspect you have a problem with the factors. For example,

 > x = factor(4:8) > x [1] 4 5 6 7 8 Levels: 4 5 6 7 8 > as.numeric(x) [1] 1 2 3 4 5 > as.numeric(as.character(x)) [1] 4 5 6 7 8 

Some comments:

  • You note that your vector contains Down and No Data characters. What to expect / want as.numeric to do with these values?
  • In read.csv try using the argument stringsAsFactors=FALSE
  • Are you sure this is sep="/t , not sep="\t"
  • Use the head(pitchman) to check the first few lines of your data.
  • It’s also very difficult to guess what your problem is when you don’t provide data. A minimal working example is always preferable. For example, I cannot run the pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t") command, since I do not have access to the data set.
+95
Feb 08 2018-11-11T00:
source share

As csgillespie said. strAsAsFactors defaults to TRUE, which converts any text to a factor. Therefore, even after deleting the text, you still have a factor in your data frame.

Now about conversion, there is a better way to do this. So I put it here as a link:

 > x <- factor(sample(4:8,10,replace=T)) > x [1] 6 4 8 6 7 6 8 5 8 4 Levels: 4 5 6 7 8 > as.numeric(levels(x))[x] [1] 6 4 8 6 7 6 8 5 8 4 

To show that it works.

Timings:

 > x <- factor(sample(4:8,500000,replace=T)) > system.time(as.numeric(as.character(x))) user system elapsed 0.11 0.00 0.11 > system.time(as.numeric(levels(x))[x]) user system elapsed 0 0 0 

This is a big improvement, but not always a bottleneck. This is important, however, if you have a large data frame and many columns to convert.

+10
Feb 08 '11 at 10:23
source share



All Articles