I am taking random samples from one distribution, $f(x)$, but trying to get information about another distribution, $g(x)$. I have a weighting function, $w(x)=Cg(x)/f(x)$, to correct for this. The result is that I have $N$ independent samples, with different weights, $w_i$, attached to them.
The question is then, what is a good estimate for the number of independent samples I really have.
For example, if my weights are {0.49, 0.48, 0.01, 0.01, 0.01} then I have pretty close to 2 independent samples. If they are {0.3, 0.3, 0.4} then I have about 3. Presumably there is a quantitative way to do this.
Also, how could I determine, given $f(x)$ and $w(x)$, what the efficiency of sampling is (i.e. How many independent samples of $g(x)$ do I, on average, get for $N$ samples of $f(x)$)?
Best Answer
This "number of independent samples I really have" is called the effective sample size in simulation books, $N_text{ess}$. Given a sample $$ x_1,ldots,x_N sim g(x) $$ leading to weights $w_i$ $(1le ile N)$, and their normalised version $$ bar w_i = w_i / sum_{j=1}^N w_j,, $$ the estimate for $N_text{ess}$ is given by $$ hat N_text{ess} = 1 big/ sum_{j=1}^N bar w_j^2,. $$ You can prove that $1le hat N_text{ess}le N$. In your example, the effective sample size is estimated by
$>$ we=c(0.49, 0.48, 0.01, 0.01, 0.01)
$>$ 1/sum((we/sum(we))^2)
[1] 2.124044
a wee more than 2.
I am not sure I understand the last part of the question.
Similar Posts:
- Solved – Effective Sample Size greater than Actual Sample Size
- Solved – Low effective sample size but good R-hat is this a problem
- Solved – Low effective sample size but good R-hat is this a problem
- Solved – Autocorrelation and Statistically Independent Samples
- Solved – Why does MCMCglmm result in small effective sample sizes for logistic regression