I found convolution in R works differently from Python. In Python, it will flip the input and run the convolution.
In the R documentation, it says
Note that the usual definition of convolution of two sequences x and y is given by convolve(x, rev(y), type = "o").
Is there any reason behind R's different approach? Do people in the statistics community use convolution differently than do those in the signal processing community?
An example in R:
x = c(c(1,2,3,4),rep(0,4)) y = c(c(5,6,7,8),rep(0,4)) convolve(x,y) # will return 70 56 39 20 0 8 23 44
But in python
x = np.hstack([np.array([1,2,3,4]), np.zeros(4)]) y = np.hstack([np.array([5,6,7,8]), np.zeros(4)]) np.convolve(x, y) # will return 5., 16., 34., 60., 61., 52., 32.
Best Answer
Relation with Fourier transforms
The convolve
function in R is using a Fourier transform to compute the convolution (that's a potential source for the difference, although I am not sure, and do not know the details).
$$ x * y = mathcal{F}^{-1}{mathcal{F}{x} cdot mathcal{F}{y}}$$
But more specifically it is using the complex conjugate, and that is how the operation turns the variable backwards.
x = c(c(1,2,3,4),rep(0,4)) y = c(c(5,6,7,8),rep(0,4)) zx = fft(c(x)) zy = fft(c(y)) round(Re(fft(zx*Conj(zy), inverse = TRUE))/8) round(Re(fft(zx*zy, inverse = TRUE))/8)
Which returns
[1] 70 56 39 20 0 8 23 44 [1] 5 16 34 60 61 52 32 0
I am not sure why the conjugate is used. This operation with the conjugate relates to convolution according to
$$mathcal{F}{x(t) * y^*(-t)} = mathcal{F}{x(t)}cdotleft(mathcal{F}{y(t)}right)^*$$
and if $y$ is real then $y = y^*$. And it also relates a bit to cross-correlation. In which case
$$mathcal{F}{x(t) * y(-t)} = mathcal{F}{x(t)}cdot left(mathcal{F}{y(t)}right)^*$$
Note that R's convolve
function is regulating this with a parameter conj
which is true by default and setting it to false gives the 'normal' convolution.
convolve(x,y,conj=FALSE)
Indirectly the question could be seen as: "why does R convolve
use conj = TRUE
as the default option?"
Potential (historic) reason for the difference
Why the standard R function for convolution does it in a reverse way, or how this historically came to be, I do not know. But possibly the origin might stem from the book Time Series: Data Analysis and Theory by David R. Brillinger (1981), which is referenced in the R documentation. In the book the convolution occurs as following:
Suppose the values $X(t)$ and $Y(t)$, $t=0,dots,T-1$, are available. We will sometimes require the convolution $$sum_{0leq {{t}atop{t+u}} leq T-1} X(t+u)Y(t) quad u = 0, pm 1, dots quad (3.6.1)$$
The term convolution is used in a more general sense, and later in the text they refer to a specific case of this type of convolution
One case in which one might require the convolution (3.6.1) is in the estimation of the moment function $m_{12}(u) = E[X_1(t+u)X_2(t)]$…
Also in early descriptions of the use of the fast Fourier transform algorithm (a reference occuring in Brillinger) do they use 'convolution' in a broader sense. In Fast Fourier Transforms: for fun and profit AFIPS '66 (Fall): Proceedings of the November 7-10, 1966, fall joint computer conference, Gentleman and Sande write
… To date, the most important uses of the fast Fourier transform have been in connection with the convolution theorem of Table I. Some uses of numerical convolutions are the following:
Auto- and Cross-Covariances:…
…
…The first major use of the convolution theorem, by Sande, was to compute the auto-covariance…
So it seems that, at least in some circles, convolution has a more general meaning and possibly the writers of the R function found the cross covariance most useful to be the standard (it is mentioned to be a major use).
Differences in conventions also (already) occurs for Fourier transformation
As whuber noted in the comments, Fourier transformation, from which convolution can be derived, is not uniquely defined. There are several conventions for $a$ and $b$ in the below defined Fourier transformation and reverse Fourier transformation formulae:
$$begin{array}{} mathcal{F}(x) = sqrt{frac{|b|}{(2pi)^{1-a}}} int_{-infty}^{infty} f(t) e^{ibxt} text{d}t \ f(t) = sqrt{frac{|b|}{(2pi)^{1+a}}} int_{-infty}^{infty} mathcal{F}(x) e^{-ibxt} text{d}x end{array}$$
and different fields use different parameters $a$ and $b$. The scaling parameter $a$ does not relate much to the 'convolution' operation, but the direction/sign of $b$ may be seen as related.
Personally I am still wondering whether this relationship with Fourier transforms (and its ambiguous conventions) is the reason why the R function is different.
We can use Fourier transforms to compute a convolution, but should that be the reason for the diversity in conventions of convolutions?
(note: In the case of Fourier transforms it may be useful to have different conventions, depending on the field. Because transforms of particular functions, that are more/less relevant depending on the field, might have some simple form or not depending on the convention.)