Econometrics: Methods and Applications Erasmus University Rotterdam https://www.coursera.org/learn/erasmus-econometrics/home/welcome
In Lecture 1.5, we applied simple regression for data on winning times on the Olympic 100 meter (athletics). We computed the regression coefficients $a$ and $b$ for two trend models, one with a linear trend and one with a nonlinear trend. In a test question, you created forecasts of the winning times for both men and women in 2008 and 2012.
Of course, you can also forecast further ahead in the future. In fact, it is even possible to predict when men and women would run equally fast, if the current trends persist.
(a) Show that the linear trend model predicts equal winning times at around 2140.
(b) Show that the nonlinear trend model predicts equal winning times at around 2192.
(c) Show that the linear trend model predicts equal winning times of approximately 8.53 seconds.
(d) Comment on these outcomes and on the underlying regression models
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
TrainExer15 = pd.read_csv('TrainExer15.txt', sep='\t', header=0, index_col=0)
TrainExer15
Year | Winmen | Winwomen | |
---|---|---|---|
Game | |||
1 | 1948 | 10.30 | 11.90 |
2 | 1952 | 10.40 | 11.50 |
3 | 1956 | 10.50 | 11.50 |
4 | 1960 | 10.20 | 11.00 |
5 | 1964 | 10.00 | 11.40 |
6 | 1968 | 9.95 | 11.08 |
7 | 1972 | 10.14 | 11.07 |
8 | 1976 | 10.06 | 11.08 |
9 | 1980 | 10.25 | 11.06 |
10 | 1984 | 9.99 | 10.97 |
11 | 1988 | 9.92 | 10.54 |
12 | 1992 | 9.96 | 10.82 |
13 | 1996 | 9.84 | 10.94 |
14 | 2000 | 9.87 | 10.75 |
15 | 2004 | 9.85 | 10.93 |
X = pd.Series.to_numpy(TrainExer15['Year']).reshape((-1, 1))
y_M = pd.Series.to_numpy(TrainExer15['Winmen'])
y_W = pd.Series.to_numpy(TrainExer15['Winwomen'])
model_M = LinearRegression().fit(X, y_M)
model_M.intercept_
28.854000000000017
model_M.coef_
array([-0.0095])
model_W = LinearRegression().fit(X, y_W)
model_W.intercept_
42.18938095238097
model_W.coef_
array([-0.01573214])
model_M.predict(np.array([2140,]).reshape((-1,1)))
array([8.524])
model_W.predict(np.array([2140,]).reshape((-1,1)))
array([8.52259524])
plt.scatter(X, y_M, color='blue', label='Men')
plt.plot(X, model_M.predict(X), color='blue')
plt.scatter(X, y_W, color='red', label='Women')
plt.plot(X, model_W.predict(X), color='red')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for men and women from 1948 to 2004')
plt.legend(loc='upper right')
<matplotlib.legend.Legend at 0x1797d9cf070>
y_M_log = np.log(y_M)
y_W_log = np.log(y_W)
model_M_log = LinearRegression().fit(X, y_M_log)
model_W_log = LinearRegression().fit(X, y_W_log)
model_M_log.coef_
array([-0.00093899])
model_M_log.intercept_
4.165995349021219
model_W_log.coef_
array([-0.00140322])
model_W_log.intercept_
5.179506516767613
np.exp(model_M_log.predict(np.array([2192,]).reshape((-1,1))))
array([8.22958057])
np.exp(model_W_log.predict(np.array([2192,]).reshape((-1,1))))
array([8.19603023])
plt.scatter(X, np.exp(y_M_log), color='blue', label='Men')
plt.plot(X, np.exp(model_M_log.predict(X)), color='blue')
plt.scatter(X, np.exp(y_W_log), color='red', label='Women')
plt.plot(X, np.exp(model_W_log.predict(X)), color='red')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for men and women from 1948 to 2004')
plt.legend(loc='upper right')
<matplotlib.legend.Legend at 0x1797e61a310>
T = np.arange(1948, 2192, 4).reshape((-1,1))
plt.plot(T, model_M.predict(T), color='blue', label='Men Linear Model')
plt.plot(T, np.exp(model_M_log.predict(T)), color='red', label='Men Log Linear Model')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for men from 1948 to 2192')
plt.legend(loc='upper right')
<matplotlib.legend.Legend at 0x1797e1cd460>
plt.plot(T, model_W.predict(T), color='blue', label='Women Linear Model')
plt.plot(T, np.exp(model_W_log.predict(T)), color='red', label='Women Log Linear Model')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for women from 1948 to 2192')
plt.legend(loc='upper right')
<matplotlib.legend.Legend at 0x1797e241a90>