Econometrics: Methods and Applications Erasmus University Rotterdam https://www.coursera.org/learn/erasmus-econometrics/home/welcome

Training Exercise 1.5¶

In Lecture 1.5, we applied simple regression for data on winning times on the Olympic 100 meter (athletics). We computed the regression coefficients $a$ and $b$ for two trend models, one with a linear trend and one with a nonlinear trend. In a test question, you created forecasts of the winning times for both men and women in 2008 and 2012.

Of course, you can also forecast further ahead in the future. In fact, it is even possible to predict when men and women would run equally fast, if the current trends persist.

(a) Show that the linear trend model predicts equal winning times at around 2140.

(b) Show that the nonlinear trend model predicts equal winning times at around 2192.

(c) Show that the linear trend model predicts equal winning times of approximately 8.53 seconds.

(d) Comment on these outcomes and on the underlying regression models

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
In [2]:
TrainExer15 = pd.read_csv('TrainExer15.txt', sep='\t', header=0, index_col=0)
In [3]:
TrainExer15
Out[3]:
Year Winmen Winwomen
Game
1 1948 10.30 11.90
2 1952 10.40 11.50
3 1956 10.50 11.50
4 1960 10.20 11.00
5 1964 10.00 11.40
6 1968 9.95 11.08
7 1972 10.14 11.07
8 1976 10.06 11.08
9 1980 10.25 11.06
10 1984 9.99 10.97
11 1988 9.92 10.54
12 1992 9.96 10.82
13 1996 9.84 10.94
14 2000 9.87 10.75
15 2004 9.85 10.93
In [4]:
X = pd.Series.to_numpy(TrainExer15['Year']).reshape((-1, 1))
In [5]:
y_M = pd.Series.to_numpy(TrainExer15['Winmen'])
In [6]:
y_W = pd.Series.to_numpy(TrainExer15['Winwomen'])
In [7]:
model_M = LinearRegression().fit(X, y_M)
In [8]:
model_M.intercept_
Out[8]:
28.854000000000017
In [9]:
model_M.coef_
Out[9]:
array([-0.0095])
In [10]:
model_W = LinearRegression().fit(X, y_W)
In [11]:
model_W.intercept_
Out[11]:
42.18938095238097
In [12]:
model_W.coef_
Out[12]:
array([-0.01573214])
In [13]:
model_M.predict(np.array([2140,]).reshape((-1,1)))
Out[13]:
array([8.524])
In [14]:
model_W.predict(np.array([2140,]).reshape((-1,1)))
Out[14]:
array([8.52259524])
In [15]:
plt.scatter(X, y_M, color='blue', label='Men')
plt.plot(X, model_M.predict(X), color='blue')
plt.scatter(X, y_W, color='red', label='Women')
plt.plot(X, model_W.predict(X), color='red')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for men and women from 1948 to 2004')
plt.legend(loc='upper right')
Out[15]:
<matplotlib.legend.Legend at 0x1797d9cf070>
In [16]:
y_M_log = np.log(y_M)
In [17]:
y_W_log = np.log(y_W)
In [18]:
model_M_log = LinearRegression().fit(X, y_M_log)
In [19]:
model_W_log = LinearRegression().fit(X, y_W_log)
In [20]:
model_M_log.coef_
Out[20]:
array([-0.00093899])
In [21]:
model_M_log.intercept_
Out[21]:
4.165995349021219
In [22]:
model_W_log.coef_
Out[22]:
array([-0.00140322])
In [23]:
model_W_log.intercept_
Out[23]:
5.179506516767613
In [24]:
np.exp(model_M_log.predict(np.array([2192,]).reshape((-1,1))))
Out[24]:
array([8.22958057])
In [25]:
np.exp(model_W_log.predict(np.array([2192,]).reshape((-1,1))))
Out[25]:
array([8.19603023])
In [26]:
plt.scatter(X, np.exp(y_M_log), color='blue', label='Men')
plt.plot(X, np.exp(model_M_log.predict(X)), color='blue')
plt.scatter(X, np.exp(y_W_log), color='red', label='Women')
plt.plot(X, np.exp(model_W_log.predict(X)), color='red')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for men and women from 1948 to 2004')
plt.legend(loc='upper right')
Out[26]:
<matplotlib.legend.Legend at 0x1797e61a310>
In [27]:
T = np.arange(1948, 2192, 4).reshape((-1,1))
In [28]:
plt.plot(T, model_M.predict(T), color='blue', label='Men Linear Model')
plt.plot(T, np.exp(model_M_log.predict(T)), color='red', label='Men Log Linear Model')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for men from 1948 to 2192')
plt.legend(loc='upper right')
Out[28]:
<matplotlib.legend.Legend at 0x1797e1cd460>
In [29]:
plt.plot(T, model_W.predict(T), color='blue', label='Women Linear Model')
plt.plot(T, np.exp(model_W_log.predict(T)), color='red', label='Women Log Linear Model')
plt.xlabel('Year')
plt.ylabel('Winng time W (s)')
plt.title('Winning times (W) of the Olympic 100-meter finals for women from 1948 to 2192')
plt.legend(loc='upper right')
Out[29]:
<matplotlib.legend.Legend at 0x1797e241a90>