Econometrics: Methods and Applications Erasmus University Rotterdam https://www.coursera.org/learn/erasmus-econometrics/home/welcome
(a) Use dataset TrainExer21 to regress log-wage on a constant and the gender dummy ‘Female’, and check the result presented in Lecture 2.1 that
log(Wage) = 4.73 - 0.25Female + e.
(b) Let e be the series of residuals of the regression in part (a). Perform two regressions:
(i) e on a constant and education;
(ii) e on a constant and the part-time job dummy.
(c) Comment on the outcomes of regressions (i) and (ii) of part (b).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
TrainExer21 = pd.read_csv('TrainExer21.txt', sep='\t', header=0, index_col=0)
TrainExer21.head()
| Wage | LogWage | Female | Age | Educ | Parttime | |
|---|---|---|---|---|---|---|
| Observ. | ||||||
| 1 | 66 | 4.190 | 0 | 49 | 1 | 1 |
| 2 | 34 | 3.526 | 1 | 42 | 1 | 1 |
| 3 | 70 | 4.248 | 1 | 42 | 1 | 1 |
| 4 | 47 | 3.850 | 0 | 38 | 1 | 0 |
| 5 | 107 | 4.673 | 1 | 54 | 1 | 1 |
x = pd.Series.to_numpy(TrainExer21['Female'])
X = sm.add_constant(x)
y = pd.Series.to_numpy(TrainExer21['LogWage'])
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.073
Model: OLS Adj. R-squared: 0.071
Method: Least Squares F-statistic: 39.00
Date: Tue, 28 Feb 2023 Prob (F-statistic): 9.10e-10
Time: 01:45:13 Log-Likelihood: -289.65
No. Observations: 500 AIC: 583.3
Df Residuals: 498 BIC: 591.7
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.7336 0.024 194.453 0.000 4.686 4.781
x1 -0.2506 0.040 -6.245 0.000 -0.329 -0.172
==============================================================================
Omnibus: 8.330 Durbin-Watson: 1.384
Prob(Omnibus): 0.016 Jarque-Bera (JB): 7.009
Skew: 0.212 Prob(JB): 0.0301
Kurtosis: 2.603 Cond. No. 2.42
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
plt.scatter(x, y)
plt.plot(X, results.predict(X))
plt.xlabel('Female')
plt.ylabel('LogWage')
plt.title('LogWage vs. Female')
Text(0.5, 1.0, 'LogWage vs. Female')
e = y - results.predict(X)
x_i = pd.Series.to_numpy(TrainExer21['Educ'])
X_i = sm.add_constant(x_i)
model_i = sm.OLS(e, X_i)
results_i = model_i.fit()
print(results_i.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.284
Model: OLS Adj. R-squared: 0.282
Method: Least Squares F-statistic: 197.4
Date: Tue, 28 Feb 2023 Prob (F-statistic): 5.23e-38
Time: 01:45:13 Log-Likelihood: -206.18
No. Observations: 500 AIC: 416.4
Df Residuals: 498 BIC: 424.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.4526 0.036 -12.524 0.000 -0.524 -0.382
x1 0.2178 0.016 14.050 0.000 0.187 0.248
==============================================================================
Omnibus: 4.168 Durbin-Watson: 1.930
Prob(Omnibus): 0.124 Jarque-Bera (JB): 4.205
Skew: 0.201 Prob(JB): 0.122
Kurtosis: 2.799 Cond. No. 5.92
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
plt.scatter(x_i, e)
plt.plot(X_i, results_i.predict(X_i))
plt.xlabel('Educ')
plt.ylabel('LogWage')
plt.title('LogWage vs. Educ')
Text(0.5, 1.0, 'LogWage vs. Educ')
x_ii = pd.Series.to_numpy(TrainExer21['Parttime'])
X_ii = sm.add_constant(x_ii)
model_ii = sm.OLS(e, X_ii)
results_ii = model_ii.fit()
print(results_ii.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.011
Model: OLS Adj. R-squared: 0.009
Method: Least Squares F-statistic: 5.394
Date: Tue, 28 Feb 2023 Prob (F-statistic): 0.0206
Time: 01:45:13 Log-Likelihood: -286.96
No. Observations: 500 AIC: 577.9
Df Residuals: 498 BIC: 586.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0284 0.023 -1.246 0.213 -0.073 0.016
x1 0.0987 0.043 2.323 0.021 0.015 0.182
==============================================================================
Omnibus: 7.495 Durbin-Watson: 1.376
Prob(Omnibus): 0.024 Jarque-Bera (JB): 6.645
Skew: 0.218 Prob(JB): 0.0361
Kurtosis: 2.640 Cond. No. 2.43
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
plt.scatter(x_ii, e)
plt.plot(X_ii, results_ii.predict(X_ii))
plt.xlabel('Parttime')
plt.ylabel('LogWage')
plt.title('LogWage vs. Parttime')
Text(0.5, 1.0, 'LogWage vs. Parttime')