I have a dataset that has 40 experimental observations of cells' activity, $n=40$, I tested several models using each of these samples. The model can only explain one cell at a time due to variability between the cells. Hence, for each model I have 40 values of log-likelihood and 40 parameter sets for each model.
I thought that this is how I should calculate the AIC:
$AIC = -2times sum_{i=1}^n logmathcal{L}_i + 2times k $
where $k$ is a number of model parameters.
But because the models do not explain all the data with one set of parameters, and
each cell ends up with their own best parameter set, I was wondering I should be dividing the log likelihood by $n$? I.e.:
$AIC = -2times frac{1}{n} sum_{i=1}^n logmathcal{L}_i + 2times k $
Best Answer
The AIC is given explicitly in (for example) Akaike, 1974[1] (including in the abstract) as:
$^{-2 log(text{maximum likelihood}) + 2(text{number of independently adjusted parameters within the model})}$
when you have independence of observations, this becomes your first form.
If you adjust AIC by shifting it, nothing of consequence changes (as long as the same shift is applied to every such term that is compared).
If you scale the entire AIC, that still allows less than or greater than comparisons but it's no longer an adjusted likelihood. People sometimes divide the entire AIC by $n$, which is like an average adjusted likelihood (and there can sometimes be good reasons to do this but it's not strictly AIC)
I see no justification whatever for dividing the first term by $n$ but leaving the second term alone; that changes the relative impact of the two terms and you are no longer doing what Akaike was doing.
In your case you're fitting multiple models to different samples. If you treat the samples as independent of each other, the original Akaike formulation works as is for this collection of models as long as you add the log-likelihoods and the parameters for each model. i.e. it's a model with as many observations as the total number of observations (assuming no overlap; these are supposed to be independent) and the number of parameters is the total number of parameters.
If you then decide to scale to some kind of average, you can do so (but as I mentioned earlier, it's no longer strictly AIC)
[1] Akaike, H. (1974),
"A new look at the statistical model identification",
IEEE Transactions on Automatic Control, 19 (6): 716–723, doi:10.1109/TAC.1974.1100705, MR 0423716.