The short version:
I can fit a model using Weighted Least Squares, given a diagonal matrix of weights $W$, by solving $(X^TWX)hat{beta}=X^TWy$ for $hat{beta}$.
Is there a GLM analogue? if so, what is it?
There seems to be a GLM analogue, e.g. with the weights
argument in R's glm
function. How is R using these weights?
The long version:
the situation
As a follow-up to my IPTW question, I just want to double check that I understand how to fit a parametric model using inverse probability(-of-treatment) weights (IPTW). The idea with IPTW is to simulate a dataset in which the relationship between my independent variables $(a^1,a^2,a^3)$ and dependent variable $y$ is unconfounded and therefore causal. For argument's sake let's say I already estimated an IPT weight $hat{w}_i$ for each observation. These weights are hypothetical probability weights from the simulated dataset.
the question
I now want to fit a GLM. I'd just use WLS, but I'm working with a binary outcome and an outcome truncated at zero. So I have a linear model $eta_i=a^Tbeta$, a link $mu_i=g(eta_i)$, and a variance $V(y_i)$ derived from my likelihood for $y$. Then the likelihood equations are
$$
sum_{i=1}^N frac{y_i-mu_i}{V(y_i)}frac{partialmu_i}{partialbeta_j}=sum_{i=1}^N frac{y_i-mu_i}{V(y_i)}left(frac{partialmu_i}{partialeta_i}x_{ij}right)=0,~forall j
$$ as per Categorical Data Analysis, Agresti, 2013, section 4.4.5.
So all I have to do is multiply $var(mu_i)$ by the weight $hat{w}_i$, right? The same way I might if I wanted to incorporate an overdispersion parameter? If so, is this because the variance of, say, 5 independent observations is 5 times the variance of one independent observation?
Follow-up idea: since the likelihood is the product of the likelihood for each observation, is there some weighting procedure I can use to just weight the likelihoods?
Best Answer
Fit an MLE by maximizing $$ l(mathbf{theta};mathbf{y})=sum_{i=1}^Nl{left(theta;y_iright)} $$
where $l$ is the log-likelihood. Fitting an MLE with inverse-probability (i.e. frequency) weights entails modifying the log-likelihood to:
$$ l(mathbf{theta};mathbf{y})=sum_{i=1}^Nw_i~l{left(theta;y_iright)}. $$
In the GLM case, this reduces to solving $$ sum_{i=1}^N w_ifrac{y_i-mu_i}{V(y_i)}left(frac{partialmu_i}{partialeta_i}x_{ij}right)=0,~forall j $$
Source: page 119 of http://www.ssicentral.com/lisrel/techdocs/sglim.pdf, linked at http://www.ssicentral.com/lisrel/resources.html#t. It's the "Generalized Linear Modeling" chapter (chapter 3) of the LISREL "technical documents."