Solved – Area under peaks considering mixture of normal distributions

I have a question regarding fitting mixture of normal distribution.
I have the following data:
enter image description here

I identified three peaks and using mixtools package, I fitted mixture of normal distributions to my data as follow:

mixture <- mixtools::normalmixEM(data, k = 3) 

Consequently, I got 3 $mu$ , 3 $sigma$, and 3 $lambda$ values.

My question is now how should I calculate the area under the peaks corresponding to the estimated normal distribution.

i.e., I need the area under the new curves which are in blue, red and green.
But my question is that, how from the calculate $mu$ and $sigma$, I reach to these three curves that I can consequently calculate the area under them.


Recall that mixture distribution $f$ is a weighted sum of it's component distributions $f_1,dots,f_k$:

$$ f(x) = lambda_1 f_1(x;mu_1, sigma_1) + lambda_2 f_2(x;mu_2, sigma_2) + dots + lambda_k f_k(x;mu_k, sigma_k) $$

where $sum_{i=1}^k lambda_i = 1$. Since $f$ integrates to unity and each of $f_i$ integrates to unity, you need to take $lambda_i$ fraction of each of them. So height of each of $f_i$ densities gets multiplied by $lambda_i$.

Below you can find simulated example that illustrates it.

set.seed(123)  N <- 1e5 mu <- c(2,4,6) sigma <- rep(0.5, 3) lambda <- (1:3)/6  dat <- c(   rnorm(N*lambda[1], mu[1], sigma[1]),   rnorm(N*lambda[2], mu[2], sigma[2]),   rnorm(N*lambda[3], mu[3], sigma[3]) )  xx <- seq(0, 8, by = 0.01) hist(dat, 100, freq = FALSE, col = "lightgray", border = 1, xlab = "", main = "") lines(xx, lambda[1]*dnorm(xx, mu[1], sigma[1]), col = "red", lwd = 2) lines(xx, lambda[2]*dnorm(xx, mu[2], sigma[2]), col = "blue", lwd = 2) lines(xx, lambda[3]*dnorm(xx, mu[3], sigma[3]), col = "orange", lwd = 2) 

Individual densities in the mixture distribution

normalmixEM function correctly identifies the parameters:

summary(mixtools::normalmixEM(dat, k = 3)) ## summary of normalmixEM object: ##          comp 1   comp 2   comp 3 ## lambda 0.166202 0.334826 0.498973 ## mu     1.998120 3.999630 6.004135 ## sigma  0.498775 0.504762 0.497422 ## loglik at estimate:  -165967.4  

As about your second question in the comments, if your data is only fy points (density estimates) and corresponding y values, then the easiest way to estimate underlying model would be to create simulated dataset by sampling y values with probabilities proportional to fy and then use it for estimating your model as usual. Below you can see plot showing the kind of data that you describes and code example illustrating how to obtain the results.

enter image description here

yy <- sample(y, 5000, prob = fy, replace = TRUE) summary(mixtools::normalmixEM(yy, k = 3)) ## number of iterations= 106  ## summary of normalmixEM object: ##          comp 1   comp 2   comp 3 ## lambda 0.168503 0.324642 0.506855 ## mu     1.973119 3.980462 5.976660 ## sigma  0.471415 0.496197 0.527898 ## loglik at estimate:  -8360.809  

As you can see, with sample of just 25 pairs of y,fy values we were able to find the parameter values (notice that they are presented in different order then in the first example).

Similar Posts:

Rate this post

Leave a Comment