My data follow a Tweedie distribution, and I'm working with multivariate abundances. So I'm trying to use the manyany()
function in the mvabund
package. manyany()
fits many univariate models to multivariate abundance data, in this case I'm trying to fit Generalized Linear Models (GLMs).
Using the example data set in mvabund
, I've given my code below. Although the example data set is Poisson distributed, the same error message appears for my own data set as it does for the example below.
data(spider) abund <- spider$abund X <- as.matrix(spider$x) ## To fit a log-linear model assuming counts are poisson: spidPois <- manyany("glm",abund,data=X,abund~X,family=poisson()) ## Fitting tweedie GLMs, error message given. spidTwee <- manyany("glm",abund,data=X,abund~X,family=tweedie(var.power=1.2)) ## Error in if (xi != power) { : missing value where TRUE/FALSE needed
I set the var.power=1.2
because a value of 1 would be a Poisson. When looking at the command rtweedie()
I noticed that xi
is said to be a synonym for power. I'm stuck now though and unsure where to go from here.
One option would be to just use a negative binomial GLM, which I've tried and it works fine. This answer indicates that a Tweedie would be better, but I guess an extension would be how to determine how poorly a negative binomial GLM would be fitting your data.
Any ideas or thoughts would be greatly appreciated!
Update:
I tried using gam()
with manyany()
, specifically with the Tweedie using tw()
, as recommended.
library(mgcv) gam <- mgcv::gam tw <- mgcv::tw spidTweeGAM <- manyany("gam", abund, data = X, abund~X, family=tw()) #### Error in manyany("gam", abund, data = X, abund ~ X, family = tw()) : family argument has length more than one but not equal to the number of columns in yMat (!?)
I don't really know how to interpret this error message though! Again, any help is appreciated.
Best Answer
You could do a manyany()
using gam()
from package mgcv. From version 1.8-0 this has had a tw()
family function (so you use it as family = tw
etc). This will also do a search for the Tweedie parameter as part of the model fitting (I suppose this is your var.power
) so you don't even need to state this value or assume it is the same for all species.
The same package has an nb
family function for the negative binomial distribution with search for $theta$, so you don't need to specify this either. You could assess the relative fitness of the two approaches using prediction error (MSE or RMSE) for example either on hold-out data or in a cross-validation process.
Similar Posts:
- Solved – Gamma vs tweedie distribution for large productivity dataset
- Solved – Does the dependent variable in a GLM have to be transformed before running the model or does the model do it
- Solved – How to calculate the Tweedie prediction based on model coefficients
- Solved – GLM Tweedie dispersion parameter
- Solved – Fitting a glm to a zero inflated positive continuous response