Search for the average standard normal distribution in a given interval

I want to find the average standard normal distribution in a given interval.

For example, if I divide the standard normal distribution into two ([-Inf: 0] [0: Inf]), I want to get the average value for each half.

The following code does almost what I want:

divide <- 2 boundaries <- qnorm(seq(0,1,length.out=divide+1)) t <- sort(rnorm(100000)) means.1 <- rep(NA,divide) for (i in 1:divide) { means.1[i] <- mean(t[(t>boundaries[i])&(t<boundaries[i+1])]) } 

But I need a more accurate (and elegant) method for calculating these numbers (means 1).

I tried the following code, but it did not work (possibly due to the lack of my knowledge of probability).

 divide <- 2 boundaries <- qnorm(seq(0,1,length.out=divide+1)) means.2 <- rep(NA,divide) f <- function(x) {x*dnorm(x)} for (i in 1:divide) { means.2[i] <- integrate(f,lower=boundaries[i],upper=boundaries[i+1])$value } 

Any ideas? Thank you in advance.

+4
source share
5 answers

The problem is that the integral from dnorm (x) in the interval (-Inf to 0) is not 1, so you got the wrong answer. To fix, you should divide the result by 0.5 (integral result). How:

 func <- function(x, ...) x * dnorm(x, ...) integrate(func, -Inf, 0, mean=0, sd=1)$value / (pnorm(0, mean=0, sd=1) - pnorm(-Inf, mean=0, sd=1)) 

Adapting it to different intervals should be easy.

+3
source

You can use a combination of fitdistr and vector indexing.

Here is an example of how to get the average and std of only positive values:

 library("MASS") x = rnorm(10000) fitdistr(x[x > 0], densfun="normal") 

or just the values ​​in the interval (0.2):

 fitdistr(x[x > 0 & x < 2], densfun="normal") 
+2
source

Say your cut points are -1, 0, 1, and 2, and you are interested in the average of the sections that mimic the standard Normal.

  samp <- rnorm(1e5) (res <- tapply(samp, findInterval(samp, c( -1, 0, 1, 2)), mean) ) # 0 1 2 3 4 #-1.5164151 -0.4585519 0.4608587 1.3836470 2.3824633 

Please note that labeling may be improved. One of the improvements could be:

 names(res) <- paste("[", c(-Inf, -1, 0, 1, 2, Inf)[-6], " , ", c(-Inf, -1, 0, 1, 2, Inf)[-1], ")", sep="") > res [-Inf , -1) [-1 , 0) [0 , 1) [1 , 2) [2 , Inf) -1.5278185 -0.4623743 0.4621885 1.3834442 2.3835116 
+2
source

Thanks for answering my question.

I collected all the answers, as I understand it:

  divide <- 5 boundaries <- qnorm(seq(0,1,length.out=divide+1)) # My original thinking t <- sort(rnorm(1e6)) means.1 <- rep(NA,divide) for (i in 1:divide) { means.1[i] <- mean(t[((t>boundaries[i])&(t<boundaries[i+1]))]) } # Based on @DWin t <- sort(rnorm(1e6)) means.2 <- tapply(t, findInterval(t, boundaries), mean) # Based on @Rcoster means.3 <- rep(NA,divide) f <- function(x, ...) x * dnorm(x, ...) for (i in 1:divide) { means.3[i] <- integrate(f, boundaries[i], boundaries[i+1])$value / (pnorm(boundaries[i+1]) - pnorm(boundaries[i])) } # Based on @Kith t <- sort(rnorm(1e6)) means.4 <- rep(NA,divide) for (i in 1:divide) { means.4[i] <- fitdistr(t[t > boundaries[i] & t < boundaries[i+1]], densfun="normal")$estimate[1] } 

results

 > means.1 [1] -1.4004895486 -0.5323784986 -0.0002590746 0.5313539906 1.3978177100 > means.2 [1] -1.3993590768 -0.5329465789 -0.0002875593 0.5321381745 1.3990997391 > means.3 [1] -1.399810e+00 -5.319031e-01 1.389222e-16 5.319031e-01 1.399810e+00 > means.4 [1] -1.399057073 -0.531946615 -0.000250952 0.531615180 1.400086731 

I believe that @Rcoster is the one I wanted. Rest is an innovative approach compared to mine, but still approximate. Thanks.

+2
source

Using distribEx and :

 library(distrEx) E(Truncate(Norm(mean=0, sd=1), lower=0, upper=Inf)) # [1] 0.797884 

(see vignette(distr) in the shareDoc package for an excellent overview of the set > and related packages.)


Or, using only the R base, here is an alternative that builds a discrete approximation of the expectation in the interval between lb and ub . The bases of the approximating rectangles are adjusted so that they all have equal areas (i.e., the probability that the point falling in each of them will be the same).

 intervalMean <- function(lb, ub, n=1e5, ...) { ## Get x-values at n evenly-spaced quantiles between lower and upper bounds xx <- qnorm(seq(pnorm(lb, ...), pnorm(ub, ...), length = n), ...) ## Calculate expectation mean(xx[is.finite(xx)]) } ## Your example intervalMean(lb=0, ub=1) # [1] 0.4598626 ## The mean of the complete normal distribution intervalMean(-Inf, Inf) ## [1] -6.141351e-17 ## Right half of standard normal distribution intervalMean(lb=0, ub=Inf) # [1] 0.7978606 ## Right half of normal distribution with mean 0 and standard deviation 100 intervalMean(lb=0, ub=Inf, mean=0, sd=100) # [1] 79.78606 
+1
source

All Articles