My question comes from this paper. The picture bellow provides a summary of the equations.
Suppose prices of two stocks satisfy (2.1) SDE. Then X(t) is expressed as (2.2) and can be modeled with as an OU process (2.3). To estimate OU coefficients we run an OLS (2.4) where R1 and R2 are time series of two stocks. Then we calculate the random walk series X(k) (2.5) and run a lagged regression on it (2.6). From which we estimate mu, sigma and theta.
Author says that after we obtain OU parameters "At this stage, we can use the standardized version of X(t) , called Z-score as trading signal." Ok, Z score is (X(t) – mu)/sigma.
What is X(t) in the Z-score? are those residuals series from (2.5) or residuals from OLS in (2.4)? How do we fit the OU coefficients to model the spread as an OU process?
EDIT:
Since I searched the internet far and wide for the implementation of the equations above and found nothing that would be usefull for this task, I wrote a Python script, it's rusty, but it does the job. Hopefully it will help someone to avoid troubles that I encountered
import pandas as pd import matplotlib.pyplot as plt import numpy as np import os from sklearn import linear_model path = "C:UsersPCDesktopMagistrmodels\test_data" os.chdir(path) data = pd.read_csv("stocks.csv", index_col = 'Date') startDate = 0 endDate = 60 s_scores = [] for i in range(0, 100): #S-score estimation Y = data['007310 KS Equity'][startDate+i:endDate+i] #cointegrated securities X = data['001680 KS Equity'][startDate+i:endDate+i] Y = (Y.shift(1) / Y - 1)[1:] X = (X.shift(1) / X - 1)[1:] clf = linear_model.LinearRegression() clf.fit(np.array(X).reshape(len(X),1), np.array(Y).reshape(len(Y),1)) beta_hr = clf.coef_[0] alpha = clf.intercept_[0] residuals = Y - beta_hr*X - alpha #residuals of the OLS (2.4) Xk = np.cumsum(residuals) #auxiliary values (2.5) #Xk = Xk[:-1] #remove the last value which is 0 #create two time series for AR(1) process x_k = Xk[0:-1] #x(t-1) y_k = Xk[1:] #x(t) clf = linear_model.LinearRegression() clf.fit(np.array(x_k).reshape(len(x_k),1), np.array(y_k).reshape(len(y_k),1)) ar1_beta = clf.coef_[0] ar1_alpha = clf.intercept_[0] ar1_resids = np.array(y_k) - ar1_beta*np.array(x_k) - ar1_alpha #calcualte residuals of AR(1) mu = ar1_alpha/(1-ar1_beta) #calculate MU sigma_eq = np.sqrt(np.var(ar1_resids)/(1-ar1_beta**2)) #calculate sigma s = -mu/sigma_eq #s-score s_scores.append(s)
Best Answer
The method discussed in the article you mention is directly inspired from the paper 'Statistical Arbitrage in the U.S. Equities Market' by Avellaneda & Lee (2008). Most of your questions are answered in the Appendix p.44.
Suppose we are at the end of trading day $D$. In the Avellaneda & Lee paper the s-score is defined as $$ s = frac{X – m}{sigma_{eq}} $$ It is a measure of the distance to equilibrium of the current cointegration residual $X$ i.e. how far away is $X$ (in standard deviation units) from the theoretical equilibrium value $m$ predicted by a mean-reversion model yet to be estimated.
In practice, using the notations of your paper and considering a 60-days trailing estimation window, $X$ evaluates as $$ X := sum_{j=t_1}^{t_{60}} epsilon_j $$ with ${epsilon_j}$ the regression residuals computed as $$ epsilon_j = R_j^1 – left(hat{beta}_0 + hat{beta}R_j^2right), forall j=t_1,…,t_{60} $$ where ${R_j^i}$ figures the 60 most recent close-to-close returns of the 2 securities of interest in case of pairs trading.
If $hat{beta}_0$ and $hat{beta}$ are OLS estimators then it is well-known that the regression residuals ${epsilon_j}$ will be uncorrelated and have zero mean such that $X = 0$ and the s-score becomes $$ s = -frac{m}{sigma_{eq}} $$
where the parameters $m$ (stationary mean) and $sigma_{eq}$ (square root of stationary variance) of the Ornstein-Uhlenbeck (OU) process $(X_t)_{tgeq0}$ are estimated as follows:
- Calculate the auxiliary values ${X_k}$ as follows $$X_k = sum_{j=t_1}^{k} epsilon_j, forall k = t_1,dots,t_{60}$$ where the $epsilon_j$ are the factor model regression residuals discussed above (note that $X = X_{60}$ by construction).
- Calibrate an AR(1) model to the previous values $$X_{k+1} = a + b X_k + zeta_{k+1}, forall k = t_1,dots,t_{59}$$ i.e. perform yet another linear (lagged-)regression to determine $hat{a}$, $hat{b}$ (intercept and slope) along with the regression residuals ${zeta_{k}}$,.
- Comparing the Euler discretisation of the Ornstein-Uhlenbek SDE (see equation 2.3 in your paper) to the AR(1) model above, infer the parameters of the OU process from $hat{a}$, $hat{b}$ and the auxialiary regression residuals ${zeta_{k}}$, see original paper. At the end you should end up with $$ m = frac{hat{a}}{1-hat{b}} $$ $$ sigma_{eq} = sqrt{frac{text{var}({zeta_{k}})}{1-hat{b}^2}}$$
The idea is then that, depending on the strength of the mean-reversion signal (= the value of the $s$-score), we decide on day $D$ to buy/sell at tomorrow's open (or close an existing position), see original paper.
Obviously, on day $D+1$, based on the new closing prices that you will observe end of day, you will be able to compute a new $s$-score by repeating the steps above (sticking to a 60-days trailing estimation window, which will now contain the most recent return corresponding to day $D+1$, while the oldest one corresponding to $D-61$ will have disappeared). Repeating this on different days allows you to plot a graph where the $s$-score evolves through time as in Figure 10 of the original article.
Similar Posts:
- Solved – Calculate the error variance in a linear regression model
- Solved – VaR in case of ARMA-GARCH
- Solved – Weighted least-squares negative fitted values
- Solved – Weighted least-squares negative fitted values
- Solved – Lindeberg condition example for a sequence of independent discrete random variables