Solved – how to specify a distribution for left skewed data

I am doing bayesian analysis. Exploratory analysis shows the parameter might has a left skewed shape. So what kind of distribution should I used as prior distribution for this parameter? Any kind of transformation that will change the parameter to have a normal shape(please note the parameters have negative values)?


The question is simple: I plotted my data and it looks like the plot below. So what kind of distribution should I assume the data is coming from?

 left skewed distribution

You can use the brms package with a Skew Normal distribution to model both right or left-skewed data. This distribution has three parameters for location, scale, and skewness respectively. The parameter for skewness (alpha) indicates the "kind of skewness" you have. When alpha < 0, the distribution is left-skewed while when alpha > 0 the distribution is right-skewed.

Here is a simple example on how to fit this kind of model with brms, and a comparison with a model using a Gaussian likelihood.

library(patchwork) library(tidverse) library(brms)  set.seed(666)  # generate some skewed data data <- rskew_normal(1e4, mu = 0, sigma = 1, alpha = -5)  # fitting a brms model with a Gaussian likelihood model_normal <- brm(data ~ 1, family = gaussian(), data = data)  # fitting a brms model with a skew normal likelihood model_skew <- brm(data ~ 1, family = skew_normal(), data = data)  # posterior predictive checking pp_check(model_normal, nsamples = 1e2) + pp_check(model_skew, nsamples = 1e2) 

The last command should return the following picture.

On the left panel you can see plotted the raw data along with data simulated from the posterior distribution of the Gaussian model. As expected, it systematically misrepresents the skewness of the raw data. On the right, you can see the match between the raw data and data simulated from the skew-normal model.

The summary of the model will give you the mean and 95% quantile intervals of the posterior distribution for each parameter.

summary(model_skew)   Family: skew_normal    Links: mu = identity; sigma = identity; alpha = identity  Formula: data ~ 1     Data: data (Number of observations: 10000)  Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;          total post-warmup samples = 4000  Population-Level Effects:            Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept    -0.01      0.01    -0.03     0.01       2676 1.00  Family Specific Parameters:        Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma     1.01      0.01     1.00     1.03       2389 1.00 alpha    -5.12      0.20    -5.53    -4.74       2256 1.00  Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample  is a crude measure of effective sample size, and Rhat is the potential  scale reduction factor on split chains (at convergence, Rhat = 1). 

Hope this helps.

Similar Posts:

Rate this post

Leave a Comment