Variable Density Conversion on a Logarithmic Scale with R

I want to build the density of a variable whose range is as follows:

Min. :-1214813.0 1st Qu.: 1.0 Median : 40.0 Mean : 303.2 3rd Qu.: 166.0 Max. : 1623990.0 

A linear density graph results in a high column in the range of [0.1000], with two very long tails for positive infinity and negative infinity. Therefore, I would like to convert the variable into a log scale so that I can see what happens around the mean. For example, I am thinking of something like:

 log_values = c( -log10(-values[values<0]), log10(values[values>0])) 

that leads to:

 Min. 1st Qu. Median Mean 3rd Qu. Max. -6.085 0.699 1.708 1.286 2.272 6.211 

The main problem is that it does not include the value 0 . Of course, I can shift all values ​​from 0 using values[values>=0]+1 , but this will lead to some distortion in the data.

What would be an accepted and scientifically sound way of transforming this variable into a journal scale?

+6
source share
3 answers

In addition to conversion, you can manipulate the histogram itself to get an idea of ​​your data. This gives you the advantage that the plots themselves remain readable and you get an immediate idea of ​​the distribution in the center. Let's say we model the following data:

 Data <- c(rnorm(1000,5,10),sample(-10000:10000,10)) > summary(Data) Min. 1st Qu. Median Mean 3rd Qu. Max. -9669.000 -2.119 5.332 85.430 12.460 9870.000 

Then you have several different approaches. The easiest way to see what happens in the center of your data is to simply capture the center of your data. In this case, let's say I'm interested in what happens between the first and third quartiles, I can build:

 hist(Data, xlim=c(-30,30), breaks=c(min(Data),seq(-30,30,by=5),max(Data)) main="Center of Data" ) 

enter image description here

If you also want to count the tails, you can convert your data to collapse the tails and change the axis to reflect this, as follows:

  • you assign to all values ​​outside the range of interests a value that is outside this range
  • you build a histogram by selecting all extreme values ​​in one bunker
  • you build the x axis with the correct marks
  • you use axis.break() from the plotrix package to add some breaks on the x axis, pointing to the discontinuous axis

For this you can use something like the following code:

  require(plotrix) # rearrange data plotdata <- Data id <- plotdata < -30 | plotdata > 30 plotdata[id] <- sign(plotdata[id])*35 # plot histogram hist(plotdata, xlim=c(-40,40), breaks=c(-40,seq(-30,30,by=5),40), main="Untailed Data", xaxt='n' # leave the X axis away ) # Construct the X axis axis(1, at=c(-40,seq(-30,30,by=10),40), labels=c(min(Data),seq(-30,30,by=10),max(Data)) ) # add axis breaks axis.break(axis=1,breakpos=-35) axis.break(axis=1,breakpos=35) 

This gives you:

enter image description here

Note that you get raw frequencies by adding freq=TRUE to the hist() function.

+3
source

What you have is what @James offers. This is problematic for values ​​in (-1,1), especially close to the origin:

 x <- seq(-2, 2, by=.01) plot(x, sign(x)*log10(abs(x)), pch='.') 

enter image description here

Something like this might help:

 y <- c(-log10(-x[x<(-1)])-1, x[x >= -1 & x <= 1], log10(x[x>1])+1) plot(x, y, pch='.') 

enter image description here

It is continuous. You can force C ^ 1 to use the interval (-1 / log (10), 1 / log (10)), which is determined by solving d / dx log10 (x) = 1:

 z <- c( -log10(-x[x<(-1/log(10))]) - 1/log(10)+log10(1/log(10)), x[x >= -1/log(10) & x <= 1/log(10)], log10(x[x>1/log(10)]) + 1/log(10)-log10(1/log(10)) ) plot(x, z, pch='.') 

enter image description here

+4
source

I add this as another answer, because although the idea is similar, the mapping is fundamentally different.

When small values ​​(<1) are included in the graph with scaling by the logarithm, this is a typical graph of log(1 + .) , And not log(.) .

Reflect the origin, and we get something useful:

 x <- seq(-2, 2, by=.01) w <- c( -log10(1-x[x<0]), x[x==0], log10(1+x[x>0])) plot(x, w, pch='.') 

It should be clear that the function is smooth, since the reflected derivatives around 0 will also be reflected. enter image description here

With much larger values ​​in x:

 x <- seq(-10000, 10000, by=.01) w <- c( -log10(1-x[x<0]), x[x==0], log10(1+x[x>0])) plot(x, w, pch='.') 

enter image description here

+1
source

All Articles