Solved – What’s the distribution of these data

I got the data, and plot the distribution of the data, and use the qqnorm function, but is seems doesn't follow a normal distribution, so which distribution should I use to discribe the data?

Empirical cumulative distribution function
enter image description here

enter image description here

I suggest you give heavy-tail Lambert W x F or skewed Lambert W x F distributions a try (disclaimer: I am the author). In R they are implemented in the LambertW package.

They arise from a parametric, non-linear transformation of a random variable (RV) $X sim F$, to a heavy-tailed (skewed) version $Y sim text{Lambert W} times F$. For $F$ being Gaussian, heavy-tail Lambert W x F reduces to Tukey's $h$ distribution. (I will here outline the heavy-tail version, the skewed one is analogous.)

They have one parameter $delta geq 0$ ($gamma in mathbb{R}$ for skewed Lambert W x F) that regulates the degree of tail heaviness (skewness). Optionally, you can also choose different left and right heavy tails to achieve heavy-tails and asymmetry. It transforms a standard Normal $U sim mathcal{N}(0,1)$ to a Lambert W $times$ Gaussian $Z$ by $$ Z = U expleft(frac{delta}{2} U^2right) $$

If $delta > 0$ $Z$ has heavier tails than $U$; for $delta = 0$, $Z equiv U$.

If you don't want to use the Gaussian as your baseline, you can create other Lambert W versions of your favorite distribution, e.g., t, uniform, gamma, exponential, beta, … However, for your dataset a double heavy-tail Lambert W x Gaussian (or a skew Lambert W x t) distribution seem to be a good starting point.

library(LambertW) set.seed(10)  ### Set parameters #### # skew Lambert W x t distribution with  # (location, scale, df) = (0,1,3) and positive skew parameter gamma = 0.1 theta.st <- list(beta = c(0, 1, 3), gamma = 0.1) # double heavy-tail Lambert W x Gaussian # with (mu, sigma) = (0,1) and left delta=0.2; right delta = 0.4 (-> heavier on the right) theta.hh <- list(beta = c(0, 1), delta = c(0.2, 0.4))  ### Draw random sample #### # skewed Lambert W x t yy <- rLambertW(n=1000, distname="t", theta = theta.st)  # double heavy-tail Lambert W x Gaussian (= Tukey's hh) zz =<- rLambertW(n=1000, distname = "normal", theta = theta.hh)  ### Plot ecdf and qq-plot #### op <- par(no.readonly=TRUE) par(mfrow=c(2,2), mar=c(3,3,2,1)) plot(ecdf(yy)) qqnorm(yy); qqline(yy)  plot(ecdf(zz)) qqnorm(zz); qqline(zz) par(op) 

ecdf and qqplot of skewed/heavy-tailed Lambert W x F distributions

In practice, of course, you have to estimate $theta = (beta, delta)$, where $beta$ is the parameter of your input distribution (e.g., $beta = (mu, sigma)$ for a Gaussian, or $beta = (c, s, nu)$ for a $t$ distribution; see paper for details):

### Parameter estimation #### mod.Lst <- MLE_LambertW(yy, distname="t", type="s") mod.Lhh <- MLE_LambertW(zz, distname="normal", type="hh")  layout(matrix(1:2, ncol = 2)) plot(mod.Lst) plot(mod.Lhh) 

enter image description here

Since this heavy-tail generation is based on a bijective transformations of RVs/data, you can remove heavy-tails from data and check if they are nice now, i.e., if they are Gaussian (and test it using Normality tests).

### Test goodness of fit #### ## test if 'symmetrized' data follows a Gaussian xx <- get_input(mod.Lhh) normfit(xx) 

enter image description here

This worked pretty well for the simulated dataset. I suggest you give it a try and see if you can also Gaussianize() your data.

However, as @whuber pointed out, bimodality can be an issue here. So maybe you want to check in the transformed data (without the heavy-tails) what's going on with this bimodality and thus give you insights on how to model your (original) data.

Similar Posts:

Rate this post

Leave a Comment