Search for the average standard normal distribution in a given interval

Question

Search for the average standard normal distribution in a given interval

I want to find the average standard normal distribution in a given interval.

For example, if I divide the standard normal distribution into two ([-Inf: 0] [0: Inf]), I want to get the average value for each half.

The following code does almost what I want:

divide <- 2 boundaries <- qnorm(seq(0,1,length.out=divide+1)) t <- sort(rnorm(100000)) means.1 <- rep(NA,divide) for (i in 1:divide) { means.1[i] <- mean(t[(t>boundaries[i])&(t<boundaries[i+1])]) }

But I need a more accurate (and elegant) method for calculating these numbers (means 1).

I tried the following code, but it did not work (possibly due to the lack of my knowledge of probability).

 divide <- 2 boundaries <- qnorm(seq(0,1,length.out=divide+1)) means.2 <- rep(NA,divide) f <- function(x) {x*dnorm(x)} for (i in 1:divide) { means.2[i] <- integrate(f,lower=boundaries[i],upper=boundaries[i+1])$value }

Any ideas? Thank you in advance.

+4

r normal-distribution mean

Hbat Apr 12 '13 at 17:21

source share

5 answers

You can use a combination of fitdistr and vector indexing.

Here is an example of how to get the average and std of only positive values:

 library("MASS") x = rnorm(10000) fitdistr(x[x > 0], densfun="normal")

or just the values in the interval (0.2):

 fitdistr(x[x > 0 & x < 2], densfun="normal")

+2

kith Apr 12 '13 at 17:56

source share

Say your cut points are -1, 0, 1, and 2, and you are interested in the average of the sections that mimic the standard Normal.

  samp <- rnorm(1e5) (res <- tapply(samp, findInterval(samp, c( -1, 0, 1, 2)), mean) ) # 0 1 2 3 4 #-1.5164151 -0.4585519 0.4608587 1.3836470 2.3824633

Please note that labeling may be improved. One of the improvements could be:

 names(res) <- paste("[", c(-Inf, -1, 0, 1, 2, Inf)[-6], " , ", c(-Inf, -1, 0, 1, 2, Inf)[-1], ")", sep="") > res [-Inf , -1) [-1 , 0) [0 , 1) [1 , 2) [2 , Inf) -1.5278185 -0.4623743 0.4621885 1.3834442 2.3835116

+2

42- Apr 12 '13 at 18:00

source share

Thanks for answering my question.

I collected all the answers, as I understand it:

  divide <- 5 boundaries <- qnorm(seq(0,1,length.out=divide+1)) # My original thinking t <- sort(rnorm(1e6)) means.1 <- rep(NA,divide) for (i in 1:divide) { means.1[i] <- mean(t[((t>boundaries[i])&(t<boundaries[i+1]))]) } # Based on @DWin t <- sort(rnorm(1e6)) means.2 <- tapply(t, findInterval(t, boundaries), mean) # Based on @Rcoster means.3 <- rep(NA,divide) f <- function(x, ...) x * dnorm(x, ...) for (i in 1:divide) { means.3[i] <- integrate(f, boundaries[i], boundaries[i+1])$value / (pnorm(boundaries[i+1]) - pnorm(boundaries[i])) } # Based on @Kith t <- sort(rnorm(1e6)) means.4 <- rep(NA,divide) for (i in 1:divide) { means.4[i] <- fitdistr(t[t > boundaries[i] & t < boundaries[i+1]], densfun="normal")$estimate[1] }

results

 > means.1 [1] -1.4004895486 -0.5323784986 -0.0002590746 0.5313539906 1.3978177100 > means.2 [1] -1.3993590768 -0.5329465789 -0.0002875593 0.5321381745 1.3990997391 > means.3 [1] -1.399810e+00 -5.319031e-01 1.389222e-16 5.319031e-01 1.399810e+00 > means.4 [1] -1.399057073 -0.531946615 -0.000250952 0.531615180 1.400086731

I believe that @Rcoster is the one I wanted. Rest is an innovative approach compared to mine, but still approximate. Thanks.

+2

Hbat Apr 12 '13 at 20:29

source share

Using distribEx and :

 library(distrEx) E(Truncate(Norm(mean=0, sd=1), lower=0, upper=Inf)) # [1] 0.797884

(see vignette(distr) in the shareDoc package for an excellent overview of the set > and related packages.)

Or, using only the R base, here is an alternative that builds a discrete approximation of the expectation in the interval between lb and ub . The bases of the approximating rectangles are adjusted so that they all have equal areas (i.e., the probability that the point falling in each of them will be the same).

 intervalMean <- function(lb, ub, n=1e5, ...) { ## Get x-values at n evenly-spaced quantiles between lower and upper bounds xx <- qnorm(seq(pnorm(lb, ...), pnorm(ub, ...), length = n), ...) ## Calculate expectation mean(xx[is.finite(xx)]) } ## Your example intervalMean(lb=0, ub=1) # [1] 0.4598626 ## The mean of the complete normal distribution intervalMean(-Inf, Inf) ## [1] -6.141351e-17 ## Right half of standard normal distribution intervalMean(lb=0, ub=Inf) # [1] 0.7978606 ## Right half of normal distribution with mean 0 and standard deviation 100 intervalMean(lb=0, ub=Inf, mean=0, sd=100) # [1] 79.78606

+1

Josh o'brien Apr 12 '13 at 18:05

source share

Rcoster · Accepted Answer · 2013-04-12T18:03:33+0000

The problem is that the integral from dnorm (x) in the interval (-Inf to 0) is not 1, so you got the wrong answer. To fix, you should divide the result by 0.5 (integral result). How:

 func <- function(x, ...) x * dnorm(x, ...) integrate(func, -Inf, 0, mean=0, sd=1)$value / (pnorm(0, mean=0, sd=1) - pnorm(-Inf, mean=0, sd=1))

Adapting it to different intervals should be easy.

Search for the average standard normal distribution in a given interval

results

More articles: