I have two time series and I want to find the lag that results in maximum correlation between the two time series. The basic problem we’re considering is the description and modeling of the relationship between these two time series.

In signal processing, cross-correlation is a measure of similarity of two series as a function of the lag of one relative to the other. This is also known as a sliding dot product or sliding inner-product.

In the relationship between two time series (y_{t} and x_{t}), the series y_{t} may be related to past lags of the x-series. The sample cross correlation function (CCF) is helpful for identifying lags of the x-variable that might be useful predictors of y_{t}.

In R, the sample CCF is defined as the set of sample correlations between x_{t+h} and y_{t} for h = 0, ±1, ±2, ±3, and so on.

A negative value for h is a correlation between the x-variable at a time before t and the y-variable at time t. For instance, consider h = −2. The CCF value would give the correlation between x_{t-2} and y_{t}.

For example, let’s start with the first series, y1:

x <- seq(0,2*pi,pi/100)
length(x)
# [1] 201
y1 <- sin(x)
plot(x,y1,type="l", col = "green")

Adding series y2, with a shift of pi/2:

y2 <- sin(x+pi/2)
lines(x,y2,type="l",col="red")

Applying the cross correlation function (cff)

cv <- ccf(x = y1, y = y2, lag.max = 100, type = c("correlation"),plot = TRUE)

The maximal correlation is calculated at a positive shift of the y1 series:

cor = cv$acf[,,1]
lag = cv$lag[,,1]
res = data.frame(cor,lag)
res_max = res[which.max(res$cor),]$lag
res_max
# [1] 44

Which means that maximal correlation between series y1 and series y2 is calculated between y1_{t+44} and y2_{t}