Training Exercise 2.1¶

Econometrics: Methods and Applications Erasmus University Rotterdam https://www.coursera.org/learn/erasmus-econometrics/home/welcome

(a) Use dataset TrainExer21 to regress log-wage on a constant and the gender dummy ‘Female’, and check the result presented in Lecture 2.1 that

log(Wage) = 4.73 - 0.25Female + e.

(b) Let e be the series of residuals of the regression in part (a). Perform two regressions:

(i) e on a constant and education;

(ii) e on a constant and the part-time job dummy.

(c) Comment on the outcomes of regressions (i) and (ii) of part (b).

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
In [2]:
TrainExer21 = pd.read_csv('TrainExer21.txt', sep='\t', header=0, index_col=0)
In [3]:
TrainExer21.head()
Out[3]:
Wage LogWage Female Age Educ Parttime
Observ.
1 66 4.190 0 49 1 1
2 34 3.526 1 42 1 1
3 70 4.248 1 42 1 1
4 47 3.850 0 38 1 0
5 107 4.673 1 54 1 1
In [4]:
x = pd.Series.to_numpy(TrainExer21['Female'])
X = sm.add_constant(x)
y = pd.Series.to_numpy(TrainExer21['LogWage'])
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.073
Model:                            OLS   Adj. R-squared:                  0.071
Method:                 Least Squares   F-statistic:                     39.00
Date:                Tue, 28 Feb 2023   Prob (F-statistic):           9.10e-10
Time:                        01:45:13   Log-Likelihood:                -289.65
No. Observations:                 500   AIC:                             583.3
Df Residuals:                     498   BIC:                             591.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          4.7336      0.024    194.453      0.000       4.686       4.781
x1            -0.2506      0.040     -6.245      0.000      -0.329      -0.172
==============================================================================
Omnibus:                        8.330   Durbin-Watson:                   1.384
Prob(Omnibus):                  0.016   Jarque-Bera (JB):                7.009
Skew:                           0.212   Prob(JB):                       0.0301
Kurtosis:                       2.603   Cond. No.                         2.42
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [5]:
plt.scatter(x, y)
plt.plot(X, results.predict(X))
plt.xlabel('Female')
plt.ylabel('LogWage')
plt.title('LogWage vs. Female')
Out[5]:
Text(0.5, 1.0, 'LogWage vs. Female')
In [6]:
e = y - results.predict(X)
In [7]:
x_i = pd.Series.to_numpy(TrainExer21['Educ'])
X_i = sm.add_constant(x_i)
model_i = sm.OLS(e, X_i)
results_i = model_i.fit()
print(results_i.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.284
Model:                            OLS   Adj. R-squared:                  0.282
Method:                 Least Squares   F-statistic:                     197.4
Date:                Tue, 28 Feb 2023   Prob (F-statistic):           5.23e-38
Time:                        01:45:13   Log-Likelihood:                -206.18
No. Observations:                 500   AIC:                             416.4
Df Residuals:                     498   BIC:                             424.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.4526      0.036    -12.524      0.000      -0.524      -0.382
x1             0.2178      0.016     14.050      0.000       0.187       0.248
==============================================================================
Omnibus:                        4.168   Durbin-Watson:                   1.930
Prob(Omnibus):                  0.124   Jarque-Bera (JB):                4.205
Skew:                           0.201   Prob(JB):                        0.122
Kurtosis:                       2.799   Cond. No.                         5.92
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [8]:
plt.scatter(x_i, e)
plt.plot(X_i, results_i.predict(X_i))
plt.xlabel('Educ')
plt.ylabel('LogWage')
plt.title('LogWage vs. Educ')
Out[8]:
Text(0.5, 1.0, 'LogWage vs. Educ')
In [9]:
x_ii = pd.Series.to_numpy(TrainExer21['Parttime'])
X_ii = sm.add_constant(x_ii)
model_ii = sm.OLS(e, X_ii)
results_ii = model_ii.fit()
print(results_ii.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.011
Model:                            OLS   Adj. R-squared:                  0.009
Method:                 Least Squares   F-statistic:                     5.394
Date:                Tue, 28 Feb 2023   Prob (F-statistic):             0.0206
Time:                        01:45:13   Log-Likelihood:                -286.96
No. Observations:                 500   AIC:                             577.9
Df Residuals:                     498   BIC:                             586.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0284      0.023     -1.246      0.213      -0.073       0.016
x1             0.0987      0.043      2.323      0.021       0.015       0.182
==============================================================================
Omnibus:                        7.495   Durbin-Watson:                   1.376
Prob(Omnibus):                  0.024   Jarque-Bera (JB):                6.645
Skew:                           0.218   Prob(JB):                       0.0361
Kurtosis:                       2.640   Cond. No.                         2.43
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [10]:
plt.scatter(x_ii, e)
plt.plot(X_ii, results_ii.predict(X_ii))
plt.xlabel('Parttime')
plt.ylabel('LogWage')
plt.title('LogWage vs. Parttime')
Out[10]:
Text(0.5, 1.0, 'LogWage vs. Parttime')