# Solved – Periodic splines to fit periodic data

In a comment to this question, user @whuber cited the possibility of using a periodic version of splines to fit periodic data. I would like to know more about this method, in particular the equations defining the splines, and how to implement them in practice (I'm mostly an `R` user, but I can make do with MATLAB or Python, if the need arises). Also, but this is a "nice to have", it would be great to know about possible advantages/disadvantages with respect to trigonometric polynomials fitting, which is how I usually deal with these kind of data (unless the response is not very smooth, in which case I switch to Gaussian Process with periodic kernel).

Contents

Splines are used in regression modeling to model possibly complex, non-linear functional forms. A spline smoothed trend consists of piecewise continuous polynomials whose leading coefficient changes at each breakpoint or knot. The spline may be specified in terms of the polynomial degree of the trend as well as the breakpoints. A spline representation of a covariate extends a single vector of observed values into a matrix whose dimension is the polynomial degree plus the number of knots.

A periodic version of splines is merely a periodic version of any regression: the data are cut into replicates of the length of the period. So for instance, modeling a diurnal trend in a multiday experiment on rats would require recoding time of experiment into 24 hour increments, so the 154th hour would be the modulo 24 value of 10 (154 = 6*24 + 10). If you fit a linear regression on the cut data, it would estimate a saw-tooth waveform for the trend. If you fit a step function somewhere in the period, it would be a square waveform that fits the series. The spline is capable of expressing a much more sophisticated wavelet. For what it's worth, in the `splines` package, there is a function `periodicSpline` which does exactly this.

I don't find R's default spline "bs" implementation useful for interpretation. So I wrote my own script below. For a spline of degree \$p\$ with \$n_k\$ knots, this representation gives the first \$p\$ columns the standard polynomial representation, the \$p+i\$-th columns (\$i le n_k\$) are simply evaluated as \$S_{p+i} = (X – k_i)^pmathcal{I}(X<k_i)\$ where \$k\$ is the actual vector of knots.

``myspline <- function(x, degree, knots) {   knots <- sort(knots)   val <- cbind(x, outer(x, knots, `-`))   val[val < 0] <- 0   val <- val^degree   if(degree > 1)     val <- cbind(outer(x, 1:{degree-1}, `^`), val)   colnames(val) <- c(     paste0('spline', 1:{degree-1}, '.1'),     paste0('spline', degree, '.', seq(length(knots)+1))   )   val } ``

For a little case study, interpolate a sinusoidal trend on the domain of 0 to \$2pi\$ (or \$tau\$) like so:

``x <- seq(0, 2*pi, by=pi/2^8) y <- sin(x) plot(x,y, type='l') s <- myspline(x, 2, pi) fit <- lm(y ~ s) yhat <- predict(fit) lines(x,yhat) ``

You'll see they're quite concordant. Further, the naming convention enables interpretation. In the regression output you see:

``> summary(fit)  Call: lm(formula = y ~ s)  Residuals:      Min       1Q   Median       3Q      Max  -0.04564 -0.02050  0.00000  0.02050  0.04564   Coefficients:              Estimate Std. Error  t value Pr(>|t|)     (Intercept) -0.033116   0.003978   -8.326 7.78e-16 *** sspline1.1   1.268812   0.004456  284.721  < 2e-16 *** sspline2.1  -0.400520   0.001031 -388.463  < 2e-16 *** sspline2.2   0.801040   0.001931  414.878  < 2e-16 *** --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  Residual standard error: 0.02422 on 509 degrees of freedom Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9988  F-statistic: 1.453e+05 on 3 and 509 DF,  p-value: < 2.2e-16 ``

The first set of covariates for my spline1.1-degree is the polynomial trend for the first domain behind the first breakpoint. The linear term is the slope of the tangent at the origin, X=0. This is nearly 1 which would be indicated by the derivative of the sinusoidal curve (cos(0) = 1), but we must bear in mind that these are approximations, and the error of extrapolating the quadratic trend out \$pi/2\$ is prone to error. The quadratic term indicates a negative, concave shape. The spline2.2 term indicates a difference from the first quadratic slope, leading to a 0.4 positive leading coefficient indicating an upward, convex shape. So we now have interpretation available for spline output and can judge the inference and estimates accordingly.

I'm going to assume that you know the periodicity of the data at hand. If the data lack a growth or moving average component, you may transform a long time series into replicates of a short series of a duration of 1 period. You now have replicates and can use data analysis to estimate the recurrent trend.

Suppose I generate the following somewhat noisey, very long time series:

``x <- seq(1, 100, by=0.01) y <- sin(x) + rnorm(length(x), 0, 10) xp <- x %% (2*pi) s <- myspline(xp, degree=2, knots=pi) lm(y ~ s) ``

The resulting output shows reasonable performance.

``> summary(fit)  Call: lm(formula = y ~ s)  Residuals:     Min      1Q  Median      3Q     Max  -39.585  -6.736   0.013   6.750  37.389   Coefficients:             Estimate Std. Error t value Pr(>|t|)     (Intercept) -0.48266    0.38155  -1.265 0.205894     sspline1.1   1.52798    0.42237   3.618 0.000299 *** sspline2.1  -0.44380    0.09725  -4.564 5.09e-06 *** sspline2.2   0.76553    0.18198   4.207 2.61e-05 *** --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  Residual standard error: 9.949 on 9897 degrees of freedom Multiple R-squared:  0.006406,  Adjusted R-squared:  0.006105  F-statistic: 21.27 on 3 and 9897 DF,  p-value: 9.959e-14 ``

Rate this post