Unit Root Test in the AR(1) Time Series with Monte Carlo Method
Unit root testing is crucial in determining the stationarity of time series data, especially in autoregressive processes like AR(1). In this note, we explore the effectiveness of the Monte Carlo method in unit root testing for AR(1) processes compared to traditional methods like the Augmented Dickey-Fuller (ADF) test.
Autoregressive Process
An AR(1) autoregressive process is governed by the equation:
\[y_t = \alpha y_{t-1} +\epsilon_t, \label{eqn:ar1}\]where $y_t$ represents the current value, $y_{t-1}$ is the previous value, $\alpha$ is the autoregressive parameter, and $\epsilon_t$ is a random variable with normal distribution:
\[\epsilon_t \sim \mathcal{N}(0,1) \notag\]Figure 1 illustrates time series data generated with different values of $\alpha$ according to Equation ($\ref{eqn:ar1}$). The behavior of these time series is governed by $\alpha$: for $\alpha<1$, the amplitude remains constrained within a narrow range, maintaining a relatively constant average; conversely, for $\alpha>1$, the amplitude exhibits exponential growth.
When $\vert \alpha\vert \geq 1$, the time series generated by Equation ($\ref{eqn:ar1}$) loses its stationarity, signifying that certain statistical properties such as mean or variance cease to remain constant over time. This phenomenon is vividly depicted in Figure 2, where multiple traces are plotted for various $\alpha$ values. For $\alpha<1$, the traces exhibit a consistent pattern with stable mean and variance throughout; however, as $\alpha$ surpasses 1, the variance becomes increasingly erratic, indicating non-stationarity. Thus, the time series maintains stationarity for $\alpha <1$, while it becomes non-stationary for $\alpha \geq 1$. The autoregressive process possesses a unit root when $\alpha=1$, and this AR(1) process with a unit root represents a random walk.
Unit Root Test
The presence of a unit root ($\alpha=1$) in an AR(1) process signifies non-stationarity. This can be assessed by comparing the estimated parameter $\alpha$ to 1. However, a more robust approach involves estimating the slope $\rho = \alpha - 1$ and assessing its statistical significance.
\[y_t - y_{t-1} = \alpha y_{t-1} - y_{t-1} +\epsilon_t \notag\] \[\Delta y_t = \rho y_{t-1} + \epsilon_t \label{eqn:rho}\]where $\Delta y_t = y_t - y_{t-1}$ and $\rho = \alpha -1$.
The distribution of $\rho$ for an autoregressive process with a unit root does not follow the $t$-distribution due to the lack of independence among predictor values $y_{t-1}$. As depicted in Figure 3, this distribution tends to exhibit skewness. Consequently, relying on a $t$-distribution for calculating p-values in the unit root test would yield inaccurate results. To address this limitation, the Monte Carlo method is employed to derive the distribution of $\rho$ for the process with a unit root, allowing for accurate p-value computation against this distribution.
Monte Carlo (MC) Method
The Monte Carlo method offers a potent alternative for unit root testing. By generating multiple samples of the time series and fitting linear regression models to each, we can approximate the distribution of $\rho$ under the null hypothesis of a unit root.
Below is the computational code for this process. For the autoregressive process with $\alpha=1$, numerous samples of the time series are created based on Equation ($\ref{eqn:ar1}$). Subsequently, linear regression is applied to each sample using Equation ($\ref{eqn:rho}$). The resulting distribution of the estimated $\rho$ from these samples serves as an approximation to the true distribution of $\rho$.
import numpy as np
import statsmodels.api as sm
n_mc_samples = 10000 # number of Monte Carlo samples
ts_length = 1000 # length of each time series
ts = (np.random.standard_normal(
size=(n_mc_samples, ts_length))
.cumsum(axis=1)
)
rho = [sm.OLS(np.diff(t),t[:-1]).fit().params[0] for t in ts] #OLS slope
The resulting distribution is visualized in Figure 3, providing a graphical representation of the estimated $\rho$ values obtained from the Monte Carlo simulation.
The $\rho$ coefficients for the time series corresponding to various $\alpha$ values in Figure 1 are computed using Ordinary Least Squares (OLS) regression. These estimated $\rho$ values are then contrasted with the distribution of $\rho$ for the AR(1) process with a unit root, as depicted in Figure 4.
As $\alpha$ moves away from $1$, the tail area delimited by the estimated $\rho$ diminishes. Defining the p-value entails considering the two-sided tail area, as computed by the code below.
def pval(unit_root_dist_list, val):
"""Caluclate two-sided p-value of the unit root test
Inputs:
unit_root_dist_list: distribution of the slope of the AR(1) process with unit root
val: slope of time series
Output:
two-sided p-value
"""
p = np.mean(unit_root_dist_list<=val)
p = np.min([p, 1-p])
pval = 2*p
return pval # two-sided p-value
Dickey-Fuller (DF) Test and Augmented Dickey-Fuller (ADF) Test
The Dickey-Fuller (DF) unit root test was originally devised for analyzing AR(1) time series data. In a stationary process, the time series tends to revert to a consistent mean, resulting in a negative OLS regression coefficient $\rho$ as per Equation ($\ref{eqn:rho}$). A more negative estimate of $\rho$ suggests a higher likelihood of the process lacking a unit root. The augmented Dickey-Fuller (ADF) test builds upon the DF test by incorporating higher-order autocorrelation terms. Critical values of the test statistics are precomputed and used to compute p-values for the unit root test.
MC Method vs. ADF Test
To compare the effectiveness of the Monte Carlo method and the ADF test in detecting unit roots, p-values were computed for time series data with varying $\alpha$ values (as shown in Figure 1) using both methods alongside the traditional t-test in OLS regression. The table below summarizes the results:
$\alpha$ | Estimated $\rho=\alpha-1$ | p-value T-Test | p-value MC | p-value ADF |
---|---|---|---|---|
$0.1$ | $-0.91$ | $0$ | $0$ | $0$ |
$0.25$ | $-0.72$ | $0$ | $0$ | $0$ |
$0.5$ | $-0.48$ | $0$ | $0$ | $0$ |
$0.9$ | $-0.11$ | $7.61\times 10^{-14}$ | $0$ | $2.9\times10^{-12}$ |
$0.95$ | $-0.042$ | $4.13\times 10^{-6}$ | $0$ | $5.4\times 10^{-6}$ |
$1$ | $-0.0011$ | $0.49$ | $0.46$ | $0.42$ |
$1.005$ | $0.0052$ | $0$ | $0$ | $1$ |
$1.01$ | $0.01$ | $0$ | $0$ | $1$ |
$1.02$ | $0.02$ | $0$ | $0$ | $1$ |
All methods correctly identify the presence of a unit root for the random walk case ($\alpha=1$), as indicated by the high p-values. However, the ADF test exhibits limitations—it only accommodates $\alpha < 1$ ($\rho < 0$) and fails to reject the null hypothesis of a unit root for $\alpha > 1$ (p-value = $1$). This highlights the superior flexibility of the Monte Carlo method. Remarkably, despite the estimated $\rho$ not adhering to the $t$-distribution, the t-test still performs reasonably well.
Further comparisons were made by generating $1000$ random walk samples ($\alpha=1$) and obtaining p-values using the Monte Carlo method, ADF test, and t-test of OLS. The distribution of these p-values is depicted in Figure 5.
The resulting p-values range between 0 and 1, with lower values indicating a higher likelihood of false negative decisions in the unit root test. Both the Monte Carlo method and ADF test exhibit similar performance, with the ADF test very slightly outperforming the Monte Carlo method. The t-test fares the worst, but not significantly so.
Lastly, Figure 6 illustrates the relationship between p-values and estimated $\rho$. While the ADF test demonstrates a monotonic trend, the Monte Carlo method and t-test exhibit non-monotonic tendencies.
Effects of Regression Model and Sample Size on Unit Root Test
In the preceding sections, we employed a regression model to estimate $\rho$, which encompasses only one parameter, as described in Equation ($\ref{eqn:rho}$). Each sample of the random walk generated in the Monte Carlo method consists of 1000 time steps.
Let’s explore how different linear regression models and the sample size of the time series affect the distribution of $\rho$.
Regression Model
The time series can be fit with various models.
\[\Delta y_t = y_t - y_{t-1} = c + \rho y_{t-1} + \delta t +\epsilon_t,\notag\]where $\rho = \alpha -1$.
Case | Parameters | Equation | |
---|---|---|---|
Case 1 | $c=0$, $\delta=0$ | No drift, no trend | $\Delta y_t = \rho y_{t-1} +\epsilon_t$ |
Case 2 | $c \neq 0$, $\delta=0$ | Drift, no trend | $\Delta y_t = c + \rho y_{t-1} +\epsilon_t$ |
case 3 | $c\neq 0$, $\delta\neq 0$ | Drift and trend | $\Delta y_t = c + \rho y_{t-1} + \delta t +\epsilon_t$ |
As Figure 7 shows, when incorporating drift and trend terms into the model, the distribution of the slope $\rho$ skews more towards the left.
Sample Size
The sample size of a time series refers to the number of time steps. We calculated the coefficient $\rho$ for 10000 samples of random walk with varying sample sizes $T$. As shown in Figure 8, the distribution becomes narrower as the sample size increases (plot a). However, the distribution of the product of sample size and the slope coefficient, $T \rho$, remains nearly independent of the sample size (plot b), with smaller sample sizes exhibiting a slightly smaller tail at low values (plot c).
Conclusion
The unit root test is a critical tool in determining the stationarity of a time series: the presence of a unit root signifies non-stationarity. Although the Augmented Dickey-Fuller (ADF) test is a commonly used approach for this purpose, its effectiveness diminishes when $\alpha>1$. In contrast, the Monte Carlo method provides a simpler conceptual framework and can effectively compute the distribution of $\alpha$ using generated time series samples. Notably, it performs comparably to the ADF test for random walks and can accommodate cases where $\alpha>1$.
References
Dickey-Fuller Test, Wikipedia.
Augmented Dickey-Fuller Test, Wikipedia.