[On recent questions I was looking into generating random vectors in R, and I wanted to share that "research" as an independent Q&A on a specific point.]
Generating random data with correlation can be done using the Cholesky decomposition of the correlation matrix $C = LL^{T}$ here , as reflected on prior posts here and here.
The question that I want to address is how to use the Uniform distribution to generate correlated random numbers from different marginal distributions in R.
Best Answer
Since the question is
"how to use the Uniform distribution to generate correlated random numbers from different marginal distributions in $mathbb{R}$"
and not only normal random variates, the above answer does not produce simulations with the intended correlation for an arbitrary pair of marginal distributions in $mathbb{R}$.
The reason is that, for most cdfs $G_X$ and $G_Y$,$$text{cor}(X,Y)netext{cor}(G_X^{-1}(Phi(X),G_Y^{-1}(Phi(Y)),$$when$$(X,Y)simtext{N}_2(0,Sigma),$$where $Phi$ denotes the standard normal cdf.
To wit, here is a counter-example with an Exp(1) and a Gamma(.2,1) as my pair of marginal distributions in $mathbb{R}$.
library(mvtnorm) #correlated normals with correlation 0.7 x=rmvnorm(1e4,mean=c(0,0),sigma=matrix(c(1,.7,.7,1),ncol=2),meth="chol") cor(x[,1],x[,2]) [1] 0.704503 y=pnorm(x) #correlated uniforms cor(y[,1],y[,2]) [1] 0.6860069 #correlated Exp(1) and Ga(.2,1) cor(-log(1-y[,1]),qgamma(y[,2],shape=.2)) [1] 0.5840085
Another obvious counter-example is when $G_X$ is the Cauchy cdf, in which case the correlation is not defined.
To give a broader picture, here is an R code where both $G_X$ and $G_Y$ are arbitrary:
etacor=function(rho=0,nsim=1e4,fx=qnorm,fy=qnorm){ #generate a bivariate correlated normal sample x1=rnorm(nsim);x2=rnorm(nsim) if (length(rho)==1){ y=pnorm(cbind(x1,rho*x1+sqrt((1-rho^2))*x2)) return(cor(fx(y[,1]),fy(y[,2]))) } coeur=rho rho2=sqrt(1-rho^2) for (t in 1:length(rho)){ y=pnorm(cbind(x1,rho[t]*x1+rho2[t]*x2)) coeur[t]=cor(fx(y[,1]),fy(y[,2]))} return(coeur) }
Playing around with different cdfs led me to single out this special case of a $chi^2_3$ distribution for $G_X$ and a log-Normal distribution for $G_Y$:
rhos=seq(-1,1,by=.01) trancor=etacor(rho=rhos,fx=function(x){qchisq(x,df=3)},fy=qlnorm) plot(rhos,trancor,ty="l",ylim=c(-1,1)) abline(a=0,b=1,lty=2)
which shows how far from the diagonal the correlation can be.
A final warning Given two arbitrary distributions $G_X$ and $G_Y$, the range of possible values of $text{cor}(X,Y)$ is not necessarily $(-1,1)$. The problem may thus have no solution.
Similar Posts:
- Solved – How to simulate two correlated AR(p) time series
- Solved – Generate random numbers for Monte Carlo simulation
- Solved – Given a multivariate normal distribution, how can we simulate uniform random variables that hold on to the correlation structure
- Solved – Generating correlated uniform random variable with R
- Solved – Does joint normality imply marginal normality