I'm building a model in which several of my covariates live on a "circle", in the sense that they take values in the interval [0,1), and 0=1. I'm wondering about techniques for dealing with this situation. One idea is to represent a circular variable theta as a pair of variables ( sin(theta), cos(theta) ). Any thoughts on this approach or better approaches?
I'm specifically using the mgcv package GAMs. Is there a way to tell the model that certain additive pieces should have the same values at the endpoints? Another package?
Thanks!
Best Answer
There are two ways of dealing with circular variables, one hacky method would be to manually duplicate your data set on either side of the boundary conditions but the more elegant solution I think would be to use the built-in spline basis functions with periodic boundary conditions !
For example:
bs="cc"
specifies a cyclic cubic regression splines (see cyclic.cubic.spline
). i.e. a penalized cubic regression splines whose ends match, up to second derivative.
Splines on the sphere
bs="sos"
. These are two dimensional splines on a sphere. Arguments are latitude and longitude, and they are the analogue of thin plate splines for the sphere. Useful for data sampled over a large portion of the globe, when isotropy is appropriate. See Spherical.Spline
for details.
bs="cp"
gives a cyclic version of a P-spline