We understand a SLR model as

$$y_i = alpha + beta x_i + varepsilon_i$$

with $varepsilon_i$ i.i.d with equal variance.

Suppose I have two instruments measuring a common entity, say, density of samples of different liquids, and assume that we do not know the exact density of each liquid (nor is there any way to accurately measure it).

My objective is to construct a model that will summarise the difference between the two instruments.

Here is how I go about it: (for simplicity sake, all models are SLR models)

Let readings from instrument 1 and instrument 2 when measuring the $i$-th liquid be $y_i$ and $w_i$ respectively, and the actual density of the $i$-th liquid be $x_i$ Then we have:

$$y_i=alpha_1+beta_1 x_i + varepsilon_{1i} \

w_i=alpha_2+beta_2 x_i + varepsilon_{2i}$$

where $varepsilon_{1i}$ and $varepsilon_{2i}$ are i.i.d and equal variance w.r.t $i$.

Rearranging from the second equation we have:

$$x_i= frac{w_i- alpha_2- varepsilon_{2i}}{beta_2 } $$

Plugging into the first equation,

$$ begin{align*} y_i &=alpha_1+beta_1 frac{w_i- alpha_2- varepsilon_{2i}}{beta_2 } + varepsilon_{1i} \

&= alpha_1 – frac{beta_1}{beta_2} alpha_2 + frac{beta_1}{beta_2} w_i + varepsilon_{1i}-frac{beta_1}{beta_2}varepsilon_{2i}\

&= alpha + beta w_i + varepsilon_i

end{align*}

$$

where $beta = frac{beta_1}{beta_2}$, $alpha = alpha_1 – beta alpha_2$, and $ varepsilon_i = varepsilon_{1i}-beta varepsilon_{2i}$, and $ varepsilon_i $ are consequently i.i.d. with equal variance w.r.t. $i$.

I would like to ask,

Is this a valid SLR model for my objective? Do I just treat readings from instrument 2 as an 'independent variable' and proceed like a normal SLR analysis?

If so, can I interpret the slope($beta$) as the ratio of the individual slopes of the two instruments?

Can I then test the hypothesis that the two instruments are similar by testing $alpha = 0$ and $ beta =1 $?

Suppose there are N total liquids we are interested in. All N liquids have been measured by instrument 1. However, we only have time/money to measure n ($<$N) liquids using instrument 2. How should I select the n liquids from N such that I would minimise variance for $beta$ and $alpha$?

**Contents**hide

#### Best Answer

The answer to 1 is no which makes the answers to all the others not applicable.

Let me start with your last equation: begin{align} y_i = alpha + beta w_i + epsilon_i end{align}

Now, let's assume that your earlier equations for $y$ and $w$ are valid classical linear regression models, so that $Cov(x,epsilon_1)=0$ and $Cov(x,epsilon_2)=0$. I'm not sure what SLR stands for—Simple Linear Regression?

Anyway, now let's calculate $Cov(w,epsilon)$ in order to verify whether your new equation is part of a valid classical linear regression model (recall that we need this to be zero): begin{align} Cov(w,epsilon) &= Cov(w,epsilon_1-frac{beta_1}{beta_2}epsilon_2) \ strut \ &= Cov(w,epsilon_1) – frac{beta_1}{beta_2}Cov(w,epsilon_2) \ strut \ &= Cov(epsilon_2,epsilon_1) – frac{beta_1}{beta_2}V(epsilon_2) end{align}

The second term is not zero unless $beta_1=0$, and that would make the example pretty silly. Even the first term is not likely to be zero in most physical applications. For that term to be zero, you would have to make the additional assumption that the errors made by the two instruments were completely uncorrelated. You could get wildly lucky (in a stopped-clock-is-right-twice-a-day kind of sense) and the two terms could magically cancel out, but there is no systematic tendency of the two terms to cancel out.

The bias in estimating $beta$ will be: begin{align} frac{Cov(epsilon_2,epsilon_1) – frac{beta_1}{beta_2}V(epsilon_2)}{V(w)} end{align}

Below, I attach a bit of `R`

code which makes a toy monte carlo to demonstrate the effect. The theoretical bias in the monte carlo is -0.25 and the answer we get in the monte carlo is too low by 0.23. So, demonstrates the point pretty well.

In general, even if you can't see how to evaluate the bias in an example like this, you can always run a little monte carlo to see what is going on. This is one of the great things about statistical software languages. Monte Carlo simulations are amazingly powerful tools to give you feedback as to whether your ideas are really good or really not.

`# This program written in response to a Cross Validated question # http://stats.stackexchange.com/questions/74527/simple-linear-regression-with-a-random-predictor # The program is a toy monte carlo. # It generates a "true" but unobservable-to-the-analyst physical state x. # Then it generates two measurements of that state from different instruments. # Then it regresses one measurement on the other. set.seed(12344321) # True state, 1000 runs of the experiment x <- rnorm(1000) # Set the various parameters of the monte carlo # Play with these for fun and profit: alpha_1 <- 0 alpha_2 <- 0 beta_1 <- 1 beta_2 <- 1 stddev_e1 <- 1 stddev_e2 <- 1 corr_e1e2 <- 0.5 # Fallible measurements e_1 <- stddev_e1*rnorm(1000) e_2 <- stddev_e2*(corr_e1e2*e_1+sqrt(1-corr_e1e2^2)*rnorm(1000)) y <- alpha_1 + beta_1*x + e_1 w <- alpha_2 + beta_2*x + e_2 var(data.frame(e_1,e_2)) var(data.frame(x,w,y)) lm(y~x) lm(w~x) # By the bias formula in the answer, this regression should have a bias of # -0.25 = (0.5-1*1)/2. That is, the coefficient should not be close to 1, # the correct value of beta_1/beta_2. Instead, it should be close # to 0.75 - 1-0.25 lm(y~w) `

### Similar Posts:

- Solved – Testing for weak instrument: include intercept in regression of instrument
- Solved – Instrumental variables equivalent representation
- Solved – Bias of more than one endogenous variables
- Solved – Formal test for exogeneity of instruments
- Solved – Update samples of a Beta with Bernoulli likelihood to the Beta posterior