Variable Density Conversion on a Logarithmic Scale with R

Question

Variable Density Conversion on a Logarithmic Scale with R

I want to build the density of a variable whose range is as follows:

Min. :-1214813.0 1st Qu.: 1.0 Median : 40.0 Mean : 303.2 3rd Qu.: 166.0 Max. : 1623990.0

A linear density graph results in a high column in the range of [0.1000], with two very long tails for positive infinity and negative infinity. Therefore, I would like to convert the variable into a log scale so that I can see what happens around the mean. For example, I am thinking of something like:

 log_values = c( -log10(-values[values<0]), log10(values[values>0]))

that leads to:

 Min. 1st Qu. Median Mean 3rd Qu. Max. -6.085 0.699 1.708 1.286 2.272 6.211

The main problem is that it does not include the value 0 . Of course, I can shift all values from 0 using values[values>=0]+1 , but this will lead to some distortion in the data.

What would be an accepted and scientifically sound way of transforming this variable into a journal scale?

+6

r logarithm scale

Mulone Dec 23 '12 at 10:46

source share

3 answers

What you have is what @James offers. This is problematic for values in (-1,1), especially close to the origin:

 x <- seq(-2, 2, by=.01) plot(x, sign(x)*log10(abs(x)), pch='.')

Something like this might help:

 y <- c(-log10(-x[x<(-1)])-1, x[x >= -1 & x <= 1], log10(x[x>1])+1) plot(x, y, pch='.')

It is continuous. You can force C ^ 1 to use the interval (-1 / log (10), 1 / log (10)), which is determined by solving d / dx log10 (x) = 1:

 z <- c( -log10(-x[x<(-1/log(10))]) - 1/log(10)+log10(1/log(10)), x[x >= -1/log(10) & x <= 1/log(10)], log10(x[x>1/log(10)]) + 1/log(10)-log10(1/log(10)) ) plot(x, z, pch='.')

+4

Matthew lundberg Dec 23 '12 at 17:17

source share

I add this as another answer, because although the idea is similar, the mapping is fundamentally different.

When small values (<1) are included in the graph with scaling by the logarithm, this is a typical graph of log(1 + .) , And not log(.) .

Reflect the origin, and we get something useful:

 x <- seq(-2, 2, by=.01) w <- c( -log10(1-x[x<0]), x[x==0], log10(1+x[x>0])) plot(x, w, pch='.')

It should be clear that the function is smooth, since the reflected derivatives around 0 will also be reflected.

With much larger values in x:

 x <- seq(-10000, 10000, by=.01) w <- c( -log10(1-x[x<0]), x[x==0], log10(1+x[x>0])) plot(x, w, pch='.')

+1

Matthew lundberg Dec 25 '12 at 2:12

source share

Joris meys · Accepted Answer · 2012-12-24T11:47:12+0000

In addition to conversion, you can manipulate the histogram itself to get an idea of your data. This gives you the advantage that the plots themselves remain readable and you get an immediate idea of the distribution in the center. Let's say we model the following data:

 Data <- c(rnorm(1000,5,10),sample(-10000:10000,10)) > summary(Data) Min. 1st Qu. Median Mean 3rd Qu. Max. -9669.000 -2.119 5.332 85.430 12.460 9870.000

Then you have several different approaches. The easiest way to see what happens in the center of your data is to simply capture the center of your data. In this case, let's say I'm interested in what happens between the first and third quartiles, I can build:

 hist(Data, xlim=c(-30,30), breaks=c(min(Data),seq(-30,30,by=5),max(Data)) main="Center of Data" )

If you also want to count the tails, you can convert your data to collapse the tails and change the axis to reflect this, as follows:

you assign to all values outside the range of interests a value that is outside this range
you build a histogram by selecting all extreme values in one bunker
you build the x axis with the correct marks
you use axis.break() from the plotrix package to add some breaks on the x axis, pointing to the discontinuous axis

For this you can use something like the following code:

  require(plotrix) # rearrange data plotdata <- Data id <- plotdata < -30 | plotdata > 30 plotdata[id] <- sign(plotdata[id])*35 # plot histogram hist(plotdata, xlim=c(-40,40), breaks=c(-40,seq(-30,30,by=5),40), main="Untailed Data", xaxt='n' # leave the X axis away ) # Construct the X axis axis(1, at=c(-40,seq(-30,30,by=10),40), labels=c(min(Data),seq(-30,30,by=10),max(Data)) ) # add axis breaks axis.break(axis=1,breakpos=-35) axis.break(axis=1,breakpos=35)

This gives you:

Note that you get raw frequencies by adding freq=TRUE to the hist() function.

Variable Density Conversion on a Logarithmic Scale with R

More articles: