I would like to know what the difference between mediation analysis and mendelian randomization (MR) analysis is.
As I know, instrumental variable (IV) in mediation analysis is almost similar to IV (=genetic variant) in MR analysis.
Moreoever, mediator in mediation analysis can be considered as exposure in MR analysis.
Although I have thought that the priciple of these two methods is largely same, there are some differences within these methods.
please let me know what it is.
Thank you in advance.
Best Answer
IV in mediation analysis stands for independent variable not instrumental variable. There is no concept of instruments in mediation analysis.
In mediation analysis, the variable of interest is the independent variable and our question is whether or not it affects the outcome through the mediator.
With instrumental variables/MR, the variable of interest is the exposure whose relationship with the outcome is confounded with missing variables, such that it is related to the errors. The instrument, gene, is simply there to allow us to obtain a casual effect for this variable. You can swap the IV if you find a better one, the IV is not of substantive interest beyond questions of its validity.
When drawn using diagrams, the approaches may seem similar. But the actual models run are very different.
IV/MR:
$$y = beta x + epsilon$$
Problem is $x$, the exposure variable, is correlated with $epsilon$, so when the equation above is estimated using OLS, $hatbeta$ will not be an estimate of the unconfounded relation between $y$ and $x$.
So you find an instrument, $z$, the random genetic variable, that is uncorrelated with $epsilon$ but causes $x$ and estimate the following using OLS:
begin{equation} x = gamma z + u end{equation}
So you return to your original problem of estimating the impact of $x$ on $y$ using:
$$y = zeta (gamma z) + epsilon$$
Now, $gamma z$ is unrelated to $epsilon$ since $gamma z$ represents the part of the exposure that is purely determined by genes. So $hatzeta$ estimated using OLS is the unconfounded relation between exposure and outcome variable.
On the other hand, with mediation, we think $x$, gene, causes $y$, but question how it happens. We may propose an $m$, exposure, as the variable that transmits this effect from genetics to outcome. The path through which this effect is transmitted is of primary interest:
$$m = zeta x + u\ y = beta_1 m + beta_2 x + epsilon$$
So $zeta times beta_1$ is the quantity of primary interest. It is the effect of genes on exposure multiplied by the effect of exposure on outcome controlling for genes. It is often called the indirect effect.
At this point, it should be obvious that the questions and math are different. The similarities are they are both three variable approaches with one variable stuck in the middle of two other variables. IV/MR is a technical approach to mitigating omitted variable bias problems. Conventional mediation analysis ignores any such problems and is all about causal pathways.
A simple example in R, we make up fake data:
set.seed(12345) x <- rbinom(5000, 1, .5) # gene presence, 50% c <- rnorm(5000) # confounder me <- 3 * x + c + rnorm(5000) # exposure as function of gene, confounder and randomness y = 2 * me + 3 * c + rnorm(5000) # outcome is function of exposure, confounder and noise
Note that in IV/MR, we assume $x$ only affects $y$ through $me$, so it is absent from the equation for $y$. Now we have our fake data, we can test the following. Assume we have no knowledge of the confounder:
coef(lm(y ~ me)) (Intercept) me -1.136293 2.712097
The effect of $me$ is overstated. With IV/MR, we do:
me.hat <- predict(lm(me ~ x)) # predict me from x coef(lm(y ~ me.hat)) (Intercept) me.hat -0.1490232 2.0653464
The effect of $me$ is much closer to the 2 value we specified in the made up data.
If we were doing mediation analysis, we would be testing:
(eff.1 <- coef(lm(me ~ x))["x"]) # path from Gene to exposure x 3.086415 (eff.2 <- coef(lm(y ~ me + x))["me"]) # path from exposure to outcome controlling for gene me 3.499809
You see that we have an overstated estimate of the effect of exposure because we haven't controlled for the confounder. And our goal is:
eff.1 * eff.2 x 10.80186
Our indirect effect is a gross overstatement of the true value. This is the mediation approach to obtaining the indirect effect of $x$ on $y$. Ideally, we want to use the confounder (again in reality, we probably don't have it):
(eff.3 <- coef(lm(y ~ me + x + c))["me"]) # path from exposure to outcome controlling for gene and confounder me 1.981281
We again have a good estimate of the effect of $me$. So we do:
eff.1 * eff.3 x 6.115055
One may ask, why don't we use the IV/MR estimate of the effect of $me$ on $y$ as the second estimate in mediation analysis? The truth is, in most contexts where we use mediation analysis, it would be impossible to argue that the exposure is the only mechanism by which the IV affects the outcome; this is a key requirement for the IV to be valid. That is why the IV is absent from the equation for $y$ when we created the fake data. If the IV (the $x$ variable) was present in that equation, we would not be able to recover 2 using IV/MR. You can test this as an exercise. This also shows one of the challenges of demonstrating that an IV is valid.