Dates in timeseries models¶
[1]:
import statsmodels.api as sm
import numpy as np
import pandas as pd
Getting started¶
[2]:
data = sm.datasets.sunspots.load()
Right now an annual date series must be datetimes at the end of the year.
[3]:
from datetime import datetime
dates = sm.tsa.datetools.dates_from_range('1700', length=len(data.endog))
Using Pandas¶
Make a pandas TimeSeries or DataFrame
[4]:
endog = pd.Series(data.endog, index=dates)
Instantiate the model
[5]:
ar_model = sm.tsa.AR(endog, freq='A')
pandas_ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)
/usr/lib/python3/dist-packages/statsmodels/tsa/ar_model.py:691: FutureWarning:
statsmodels.tsa.AR has been deprecated in favor of statsmodels.tsa.AutoReg and
statsmodels.tsa.SARIMAX.
AutoReg adds the ability to specify exogenous variables, include time trends,
and add seasonal dummies. The AutoReg API differs from AR since the model is
treated as immutable, and so the entire specification including the lag
length must be specified when creating the model. This change is too
substantial to incorporate into the existing AR api. The function
ar_select_order performs lag length selection for AutoReg models.
AutoReg only estimates parameters using conditional MLE (OLS). Use SARIMAX to
estimate ARX and related models using full MLE via the Kalman Filter.
To silence this warning and continue using AR until it is removed, use:
import warnings
warnings.filterwarnings('ignore', 'statsmodels.tsa.ar_model.AR', FutureWarning)
warnings.warn(AR_DEPRECATION_WARN, FutureWarning)
Out-of-sample prediction
[6]:
pred = pandas_ar_res.predict(start='2005', end='2015')
print(pred)
2005-12-31 20.003294
2006-12-31 24.703989
2007-12-31 20.026127
2008-12-31 23.473653
2009-12-31 30.858579
2010-12-31 61.335464
2011-12-31 87.024704
2012-12-31 91.321262
2013-12-31 79.921631
2014-12-31 60.799524
2015-12-31 40.374881
Freq: A-DEC, dtype: float64
Using explicit dates¶
[7]:
ar_model = sm.tsa.AR(data.endog, dates=dates, freq='A')
ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)
pred = ar_res.predict(start='2005', end='2015')
print(pred)
[20.00329441 24.70398898 20.02612701 23.47365317 30.85857851 61.33546424
87.02470351 91.32126171 79.92163054 60.79952402 40.3748811 ]
This just returns a regular array, but since the model has date information attached, you can get the prediction dates in a roundabout way.
[8]:
print(ar_res.data.predict_dates)
DatetimeIndex(['2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31',
'2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31',
'2013-12-31', '2014-12-31', '2015-12-31'],
dtype='datetime64[ns]', freq='A-DEC')
Note: This attribute only exists if predict has been called. It holds the dates associated with the last call to predict.