Consider the situation where the $$x$$-variable is observed with measurement error, which is rather common for complex macroeconomic variables like national income.

Let $$x^*$$ be the true, unobserved economic variable, and let the data generating process (DGP) be given by

$y_i = \alpha + \beta x_i^* + \epsilon_i^*,$

where $$x_i^*$$ and $$\epsilon_i^*$$ are uncorrelated.

The observed $$x$$-values are $$x_i = x_i^* + v_i$$, with measurement errors $$v_i$$ that are uncorrelated with $$x_i^*$$ and $$\epsilon_i^*$$. The signal-to-noise ratio is defined as $$SN = \frac{\sigma_*^2}{\sigma_v^2}$$, where $$\sigma_{x^*}^2$$ is the variance of $$x^*$$ and $$\sigma_v^2$$ that of $$v$$.

The estimated regression model is $$y_i = \alpha + \beta x_i + \epsilon_i$$, and we consider the least squares estimator $$b$$ of $$\beta$$.

(a) Do you think that the value of $$b$$ depends on the variance of the measurement errors? Why?

(b) Show that

$b = \beta + \dfrac{\sum_{i=1}^n (x_i-\overline{x})(\epsilon_i-\overline{\epsilon})}{\sum_{i=1}^n (x_i-\overline{x})^2}.$

(c) Show that $$\epsilon_i = \epsilon_i^*-\beta v_i$$.

(d) Show that the covariance between $$x_i$$ and $$\epsilon_i$$ is equal to $$-\beta \sigma_v^2$$.

(e) Show that for large sample size $$n$$ we get $$b-\beta \approx \dfrac{-\beta \sigma_v^2}{\sigma_{x^*}^2+\sigma_v^2}$$

(f) Compute the approximate bias $$b-\beta$$ for $$\beta=1$$ in the cases $$SN=1$$, $$SN=3$$, and $$SN=10$$.

# (a)

(a) Do you think that the value of $$b$$ depends on the variance of the measurement errors? Why?

## Unbiased sample variance

$s_x^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2$ $s_y^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - \overline{y})^2$

## Sample covariance

The sample covariance of $$x_i$$ and $$y_i$$ is

$\mathrm{cov}(x,y) = \frac{1}{n-1} \sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})$

## Sample correlation coefficient

Let $$A=\frac{1}{n-1}$$.

The sample correlation coefficient of $$x_i$$ and $$y_i$$ is

\begin{align*} \rho_{x,y} &= \dfrac{\mathrm{cov}(x,y)}{s_x s_y}\\ &=\dfrac{A \sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{\sqrt{A \sum_{i=1}^n (x_i - \overline{x})^2}\sqrt{\sum_{i=1}^n (y_i - \overline{y})^2}}\\ &=\dfrac{\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum_{i=1}^n (x_i - \overline{x})^2} \sqrt{\sum_{i=1}^n (y_i - \overline{y})^2}} \end{align*}

## Estimator $$b$$ of $$\beta$$

\begin{align*} b=&\dfrac{\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{\sum_{i=1}^n (x_i-\overline{x})^2}\\ &= \dfrac{(n-1) \textrm{cov}(x,y)}{(n-1) s_x^2}\\ &=\dfrac{\mathrm{cov}(x,y)}{s_x^2} \end{align*}

$$x_i = x_i^*+v_i$$, and $$v_i$$ are uncorrelated with $$x_i^*$$. It follows that

$s_x^2 = s_{x^*}^2 + 2 \mathrm{cov}(x,v) + s_v^2 = s_{x^*}^2 + s_v^2$

Thus

$b = \dfrac{\mathrm{cov}(x,y)}{s_{x^*}^2+s_v^2}$

is the way that $$b$$ depends on $$s_v^2$$, the variance of the measurement errors.

# (b)

(b) Show that

$b = \beta + \dfrac{\sum_{i=1}^n (x_i-\overline{x})(\epsilon_i-\overline{\epsilon})}{\sum_{i=1}^n (x_i-\overline{x})^2}.$

We have established so far that

$b=\dfrac{\mathrm{cov}(x,y)}{s_x^2}$ \begin{align*} \mathrm{cov}(x_i,y_i)&=\mathrm{cov}(x_i,\alpha + \beta x_i + \epsilon_i)\\ &=\mathrm{cov}(x_i,\alpha) + \mathrm{cov}(x_i,\beta x_i) + \mathrm{cov}(x_i,\epsilon_i)\\ &=0+\beta \mathrm{cov}(x_i,x_i) + \mathrm{cov}(x_i,\epsilon_i)\\ &=\beta s_x^2 + \mathrm{cov}(x_i,\epsilon_i) \end{align*}

that is,

$\mathrm{cov}(x_i,y_i) = \beta s_x^2 + \mathrm{cov}(x_i,\epsilon_i)$

Hence

$\dfrac{\mathrm{cov}(x_i,y_i)}{s_x^2} = \beta + \dfrac{\mathrm{cov}(x_i,\epsilon_i)}{s_x^2}$

Therefore

$b = \beta + \dfrac{\mathrm{cov}(x_i,\epsilon_i)}{s_x^2}$

This means

\begin{align*} b&= \beta + \dfrac{\frac{1}{n-1} \sum_{i=1}^n (x_i-\overline{x})(\epsilon_i-\overline{\epsilon}}{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}\\ &=\beta + \dfrac{\sum_{i=1}^n (x_i-\overline{x})(\epsilon_i-\overline{\epsilon})}{\sum_{i=1}^n (x_i-\overline{x})^2} \end{align*}

which is what we are asked to establish in (b).

# (c)

(c) Show that $$\epsilon_i = \epsilon_i^*-\beta v_i$$.

\begin{align*} \epsilon_i &= y_i - \alpha - \beta x_i\\ &=\alpha + \beta x_i^* + \epsilon_i^* - \alpha - \beta x_i\\ &=\beta(x_i^* - x_i)+\epsilon_i^*\\ &=\beta(-v_i) + \epsilon_i^*\\ &=-\beta v_i + \epsilon_i^* \end{align*}

Thus

$\epsilon_i = \epsilon_i^* - \beta v_i$

# (d)

(d) Show that the covariance between $$x_i$$ and $$\epsilon_i$$ is equal to $$-\beta \sigma_v^2$$.

\begin{align*} \mathrm{cov}(x_i,\epsilon_i)&=\mathrm{cov}(x_i^* + v_i, -\beta v_i + \epsilon_i^*)\\ &=-\beta \mathrm{cov}(x_i^*, v_i) +\mathrm{cov}(x_i^*, \epsilon_i^*) - \beta \mathrm{cov}(v_i,v_i) + \mathrm{cov}(v_i,\epsilon_i^*)\\ &=-\beta (0) + 0 - \beta s_v^2 + 0\\ &=-\beta s_v^2 \end{align*}

Hence

$\mathrm{cov}(x_i,\epsilon_i)=-\beta s_v^2$

# (e)

(e) Show that for large sample size $$n$$ we get $$b-\beta \approx \dfrac{-\beta \sigma_v^2}{\sigma_*^2+\sigma_v^2}$$

We have established that

$b = \beta + \dfrac{\mathrm{cov}(x_i,\epsilon_i)}{s_x^2}$

and $$s_x^2 = s_{x^*}^2 + s_v^2$$ and $$\mathrm{cov}(x_i,\epsilon_i)=-\beta s_v^2$$.

Using these,

$b=\beta + \dfrac{-\beta s_v^2}{s_{x^*}^2 + s_v^2}$ \begin{align*} \sigma_{x^*}^2 &= \frac{1}{n} \sum_{i=1}^n (x_i-\overline{x})^2\\ &= \frac{n-1}{n} \frac{1}{n-1}\sum_{i=1}^n (x_i-\overline{x})^2\\ &=\frac{n-1}{n} s_x^2, \end{align*}

and likewise $$\sigma_{x^*}^2 = \frac{n-1}{n} s_{x^*}^2$$ and $$\sigma_v^2 = \frac{n-1}{n} s_v^2$$.

Therefore

\begin{align*} b&=\beta - \beta \dfrac{\frac{n}{n-1} \sigma^2}{\frac{n}{n-1} \sigma_{x^*}^2 + \frac{n}{n-1} \sigma_v^2}\\ &=\beta-\beta \dfrac{\sigma_v^2}{\sigma_{x^*}^2 + \sigma_v^2} \end{align*}

so 1

$b - \beta = \dfrac{-\beta \sigma_v^2}{\sigma_{x^*}^2 + \sigma_v^2}$

# (f)

(f) Compute the approximate bias $$b-\beta$$ for $$\beta=1$$ in the cases $$SN=1$$, $$SN=3$$, and $$SN=10$$.

The signal-to-noise ratio is defined as $$SN = \frac{\sigma_{x^*}^2}{\sigma_v^2}$$.

$$SN=1$$ implies $$\sigma_{x^*}^2=\sigma_v^2$$.

Then

$b-\beta = \dfrac{-\sigma_v^2}{\sigma_v^2+\sigma_v^2} = -\frac{1}{2}$

$$SN=3$$ implies $$\sigma_{x^*}^2 = 3\sigma_v^2$$.

Then

$b-\beta = \dfrac{- \sigma_v^2}{3\sigma_v^2+\sigma_v^2} = -\frac{1}{4}$

$$SN=10$$ implies $$\sigma_{x^*}^2=10\sigma_v^2$$.

Then

$b-\beta = \dfrac{-\sigma_v^2}{10\sigma_v^2+\sigma_v^2} = -\frac{1}{11}$

## References: 2

1. I have made a mistake or made an unstated assumption, since I have derived equality for a formula that we are told to show applies as $$n \to \infty$$, and thus what I have deduced is too strong.