I have two time series and I want to find the lag that results in maximum correlation between the two time series. The basic problem we’re considering is the description and modeling of the relationship between these two time series.
In signal processing, cross-correlation is a measure of similarity of two series as a function of the lag of one relative to the other. This is also known as a sliding dot product or sliding inner-product.
For discrete functions, the cross-correlation is defined as:
In the relationship between two time series (yt and xt), the series yt may be related to past lags of the x-series. The sample cross correlation function (CCF) is helpful for identifying lags of the x-variable that might be useful predictors of yt.
In R, the sample CCF is defined as the set of sample correlations between xt+h and yt for h = 0, ±1, ±2, ±3, and so on.
A negative value for h is a correlation between the x-variable at a time before t and the y-variable at time t. For instance, consider h = −2. The CCF value would give the correlation between xt-2 and yt.
For example, let’s start with the first series, y1:
x <- seq(0,2*pi,pi/100) length(x) #  201 y1 <- sin(x) plot(x,y1,type="l", col = "green")
Adding series y2, with a shift of pi/2:
y2 <- sin(x+pi/2) lines(x,y2,type="l",col="red")
Applying the cross correlation function (cff)
cv <- ccf(x = y1, y = y2, lag.max = 100, type = c("correlation"),plot = TRUE)
The maximal correlation is calculated at a positive shift of the y1 series:
cor = cv$acf[,,1] lag = cv$lag[,,1] res = data.frame(cor,lag) res_max = res[which.max(res$cor),]$lag res_max #  44
Which means that maximal correlation between series y1 and series y2 is calculated between y1t+44 and y2t